Data tables are exemplary and limited to 20 rows. You can download the full data files directly from github

Chapter 1 Introduction to R

Chapter Objectives

  • Learn about R as a programming language
  • Define Integrated Development Environment
  • Define objects
  • Learn the assignment operator
  • Define functions
  • Executing a loop
  • Learn logical operators
  • Learn about R data types
  • Learn about object classes
  • Indexing data objects
  • Extending R functionality with packages
  • Writing a custom function
  • Create a scatter plot with sports data
  • Create a heatmap with sports data

2019-2020 Boston Player Stats.csv


2019-2020 Dallas Player Stats.csv


Chapter 2 Data Visualization: Best Practices

Chapter Objectives

  • Articulate best practices of convincing visualizations
  • Understand the programmatic layering used in most popular plotting R library ggplot2
  • Understand the difference between client and server-side data
  • Create various plots with ggplot including sports fields and courts
  • Create interactive visualizations with echarts4r that are client side

Dellavedova_18_19_season.csv


sampe_hits.csv


Chapter 3 Geospatial Data

Chapter Objectives

  • Download baseball data from various sources
  • Perform a “crosswalk” inner-join with data to append additional information
  • Chart a player’s performance over time
  • Web scrape player data from a GET request
  • Tabulate pitch types by year
  • Visualize the change by pitch type over time
  • Create and interpret a box plot of pitch speed by type and time
  • Create a 2D density plot of pitch type
  • Make JavaScript interactive and static plots for each visual in this chapter

Miguel_Castro_pitchStats_backup.csv


copy_People_object.csv


copy_Pitching_object.csv


lahman_mlb_xWalk.csv


Chapter 4 Evaluating Players for the Football Draft

Chapter Objectives

  • Following the Sample, Explore, Modify, Model and Assess, SEMMA, workflow build various football player statistics models
  • Explore the player evaluation data and build dynamic visualizations
  • Understand the implications for the annual football player prospect evaluations in terms of being drafted versus not
  • Use a KNN binary classification model to classify the probability of being drafted
  • Use a KNN multiple classification model to classify the most likely draft round
  • Use a KNN regression model to predict the overall pick a player may be selected
  • Apply K-means clustering to the player data to identify cohorts by mean values
  • Apply K-medoid clustering to the player data to identify cohorts by median values
  • Apply an unsupervised, non-Euclidean, algorithm called spherical K-means to separate player prospects and identify the prototypical players within a cluster

combine_data_2000_2020.csv


Chapter 5 Logistic Regression

Chapter Objectives

  • Follow the SEMMA approach to modeling
  • Construct various visuals within the data exploration phase of the modeling exercise
  • Build a logistic regression to model winning team characteristics
  • Calculate multiple model key performance indicators and compare them across training and validation partitions
  • Construct a waterfall using the model coefficients to understand the proportion of the model’s output is explained by each feature
  • Organize the modeling coefficients and winning team data to construct a scatter plot
  • Interpret the scatter plot quadrants as a means to understand team behavior and what aspects of women’s collegiate basketball should be focused or deprioritized by players and coaches
  • Identify top-performing teams according to statistic(s) that may be overlooked by other teams

final_ESPN.R

  • Download the script here

imputed_DefensiveStats.csv


imputed_OffensiveStats.csv


raw_allBB.csv


Chapter 6 Guaging Fan Sentiment in Cricket

Chapter Objectives

  • Learn what NLP is and a basic approach to analyzing text
  • Learn the basic NLP terms and object classes
  • Define the six-step NLP workflow
  • Apply various string manipulation functions to a collection of forum posts as documents
  • Identify two-word lexicons for sentiment analysis and adjust one for the forum’s context
  • Programmatically change the tokenization of the text from unigram to bigrams
  • Learn about full, inner, and left joins
  • Visualize the overall forum community comment velocity
  • Build a word cloud of frequent two-word phrases
  • Classify forum comments by emotional category, then plot as a radar chart for the entire forum conversation
  • Focusing on individual users, calculate and visualize the network graph of comments to identify the most central author
  • Individually review the most and least negative authors, creating a bar chart for review

commentsReddit_Feb_15_2021.csv


Chapter 7 Gambling Optimization

Chapter Objectives

  • Understand the basic premise of sports line ups
  • Contextualize the impact of fantasy sports and gambling
  • Learn to set a football lineup
  • Define a simulation of outcomes
  • Solve a linear programming football lineup problem
  • Identify a single lineup which maximizes a week’s lineup using player point predictions

copy_allPos.csv


copy_scrape.rds

  • A copy of the scrape is offered here so you can see how the data is structured in case the webpages change after publication.

salaryData_wk_17.csv


Chapter 8 Exploratory Data Analysis

Chapter Objectives

  • Download basic sports data
  • Apply various functions to understand summary, tabular and statistical information of the data
  • Build bar, timeline event, and line charts to explore patterns in the data
  • Construct a Markov Chain to understand the next most likely event in an effort to more fully understand characteristics of the overall scenario represented in the data

MACgameInfo_table.csv


macPlays_2019_2.csv


Team Roster Example

©Copyright 2022

rstatsbook.com