NFL 2022

NFL
Author

Jake Starkey

Published

November 2, 2023

Question 1. Personal Website on GitHub

Q1a

Q1b

  • Make sure that your GitHub repository, named YOUR_GITHUB_USERNAME.github.io, is set to public. ✅

  • Update your website at https://YOUR_GITHUB_USERNAME.github.io/index.html to:

    • Include links to (1) your LinkedIn page, (2) GitHub page (https://github.com/YOUR_GITHUB_USERNAME), and (3) a PDF file of your Rèsume (https://YOUR_GITHUB_USERNAME.github.io/YOUR_RESUME.pdf) ✅

    • Offer a description of yourself, detailing your education background and professional experience. ✅

    • Display your own profile picture with your face, not the one shown below. ✅

Q1c

  • Change the title of your blog. ✅

    • That is, to replace Insightful Analytics with your own blog name.
  • Remove the blog posts Post With Code, Starwars, and Beer Markets. ✅

  • Revise the Welcome To My Blog post. ✅

  • Post three different blog articles based on data analysis using the following three CSV files: ✅

    1. https://bcdanl.github.io/data/DOHMH_NYC_Restaurant_Inspection.csv
    2. https://bcdanl.github.io/data/spotify_all.csv
    3. https://bcdanl.github.io/data/beer_markets.csv
    • Make sure that each blog post has categories and is associated with a proper image file that is displayed as a thumbnail at the list page of the blog. ✅

    • Make sure that each blog post uses emojis properly. (E.g., 😄 🍺 🎶 🍕) ✅

    • Make sure that each blog post includes its thumbnail image and at least three ggplot figures. ✅

    • You can refer to the previous DANL 200 Homework Assignments and Exams for your blog posts.

##Question 2. NFL in 2022 🏈

  • Add a blog post with your answers for Question 2 to your website (https://YOUR_GITHUB_USERNAME.github.io/).
    • Make sure that your blog post for Question 2 includes all the questionnaires and your answers to them.
    • Make sure that your blog post for Question 2 has a section for each sub-question (e.g., Q2a, Q2b) in Question 2, so that the Table of Contents display the section for each questionnaire.
  • The following is the data.frame for Question 2.
NFL2022_stuffs <- read.csv('https://bcdanl.github.io/data/NFL2022_stuffs.csv')
  • NFL2022_stuffs is the data.frame that contains information about NFL games in year 2022, in which the unit of observation is a single play for each drive in a NFL game.

Variable Description 🏈

  • play_id: Numeric play identifier that when used with game_id and drive provides the unique identifier for a single play
  • game_id: Ten digit identifier for NFL game.
  • drive: Numeric drive number in the game.
  • week: Season week.
  • posteam: String abbreviation for the team with possession.
  • qtr: Quarter of the game (5 is overtime).
  • half_seconds_remaining: Numeric seconds remaining in the half.
  • down: The down for the given play.
    • Basically you get four attempts (aka downs) to move the ball 10 yards (by either running with it or passing it).
    • If you make 10 yards then you get another set of four downs.
  • pass: Binary indicator if the play was a pass play.
  • wp: Estimated winning probability for the posteam given the current situation at the start of the given play.

Q2a 🏈

library(tidyverse)
Warning: package 'dplyr' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr)

In data.frame, NFL2022_stuffs, remove observations for which values of posteam is missing.

q2a <- NFL2022_stuffs %>% 
  filter(!is.na(posteam))

Q2b 🏈

  • Summarize the mean value of pass for each posteam when all the following conditions hold:
  1. wp is greater than 20% and less than 75%;
  2. down is less than or equal to 2; and
  3. half_seconds_remaining is greater than 120.
q2b <- NFL2022_stuffs %>% 
  filter(between(wp, 0.2, 0.75),
         down <= 2,
         half_seconds_remaining > 120)

meanvalq2b <- q2b %>% 
  group_by(posteam) %>% 
  summarise(mean_pass = mean(pass, na.rm = TRUE))

Q2c 🏈

  • Provide both (1) a ggplot code with geom_point() using the resulting data.frame in Q2b and (2) a simple comments to describe the mean value of pass for each posteam.
    • In the ggplot, reorder the posteam categories based on the mean value of pass in ascending or in descending order.
meanvaldesc <- meanvalq2b %>% 
  arrange(desc(mean_pass))

ggplot(meanvaldesc, aes(x = mean_pass, y = reorder(posteam, mean_pass))) +
  geom_point()+
  labs(x= "Percentage of Pass Plays",
       y= "Team with possesion")

Comments:

  • Cincinnati, Kansas City, Los Angeles Chargers, Buffalo Bills, and Philadelphia eagles had the top 5 highest average percentage of pass plays during the 2022 season.
  • Atlanta, Washington, Chicago, New Orleans, and Tennessee had the top 5 lowest average percentage of pass plays during the 2022 season.

Q2d 🏈

  • Consider the following data.frame, NFL2022_epa:
NFL2022_epa <- read.csv('https://bcdanl.github.io/data/NFL2022_epa.csv')
  • Variable description for NFL2022_epa
    • play_id: Numeric play identifier that when used with game_id and drive provides the unique identifier for a single play
    • game_id: Ten digit identifier for NFL game.
    • drive: Numeric drive number in the game.
    • posteam: String abbreviation for the team with possession.
    • passer: Name of the player who passed a ball to a receiver by initially taking a three-step drop and backpedaling into the pocket to make a pass. (Mostly, they are quarterbacks)
    • receiver: Name of the receiver.
    • epa: Expected points added (EPA) by the posteam for the given play.
  • Create the data.frame, NFL2022_stuffs_EPA, that includes
    1. All the variables in the data.frame, NFL2022_stuffs;
    2. The variables, passer, receiver, and epa, from the data.frame, NFL2022_epa. by joining the two data.frames.
  • In the resulting data.frame, NFL2022_stuffs_EPA, remove observations with NA in passer.

Answer:

NFL2022_stuffs_EPA <- left_join(NFL2022_stuffs, NFL2022_epa, by = c("play_id", "game_id", "drive", "posteam"))

NFL2022_stuffs_EPA <- NFL2022_stuffs_EPA %>%
  filter(!is.na(passer))

Q2e 🏈

  • Provide both (1) a single ggplot and (2) a simple comment to describe the NFL weekly trend of weekly mean value of epa for each of the following two passers,
    1. “J.Allen”
    2. “P.Mahomes”
two_passers <- c("J.Allen", "P.Mahomes")

filtered_twopassers <- NFL2022_stuffs_EPA %>% 
  filter(passer %in% two_passers)

mean_epa_data <- filtered_twopassers %>% group_by(week, passer) %>% 
  summarise(mean_epa = mean(epa, na.rm = TRUE))
`summarise()` has grouped output by 'week'. You can override using the
`.groups` argument.
ggplot(mean_epa_data, aes(x= week, y= mean_epa, color = passer))+
  geom_line()+
  scale_color_manual(values =c("J.Allen" ="blue", "P.Mahomes"="red"))+
  labs(x= "Week",
       y= "Mean value of expected points added (EPA)")

Comment: Patrick Mahomes generally had a higher mean value of epa. However, there were a few weeks that Josh Allen had a higher mean value of epa.

Q2f 🏈

Calculate the difference between the mean value of epa for “J.Allen” the mean value of epa for “P.Mahomes” for each value of week.

difference_epa <- mean_epa_data %>% 
  pivot_wider(names_from = passer, values_from = mean_epa)

difference_epa$epa_differnce <- difference_epa$'J.Allen' - difference_epa$'P.Mahomes'

print(difference_epa)
# A tibble: 22 × 4
# Groups:   week [22]
    week J.Allen P.Mahomes epa_differnce
   <int>   <dbl>     <dbl>         <dbl>
 1     1   0.530    0.698        -0.169 
 2     2   0.487    0.148         0.339 
 3     3   0.169    0.246        -0.0763
 4     4   0.191    0.271        -0.0803
 5     5   0.627    0.302         0.325 
 6     6   0.307    0.133         0.173 
 7     7  NA        0.701        NA     
 8     8   0.224   NA            NA     
 9     9  -0.208    0.0965       -0.304 
10    10   0.161    0.589        -0.429 
# ℹ 12 more rows

Q2g 🏈

  • Summarize the resulting data.frame in Q2d, with the following four variables:

    • posteam: String abbreviation for the team with possession.
    • passer: Name of the player who passed a ball to a receiver by initially taking a three-step drop, and backpedaling into the pocket to make a pass. (Mostly, they are quarterbacks.)
    • mean_epa: Mean value of epa in 2022 for each passer
    • n_pass: Number of observations for each passer
  • Then find the top 10 NFL passers in 2022 in terms of the mean value of epa, conditioning that n_pass must be greater than or equal to the third quantile level of n_pass.

summary_data <- NFL2022_stuffs_EPA %>%
  group_by(posteam, passer) %>%
  summarise(
    mean_epa = mean(epa, na.rm = TRUE),
    n_pass = n()
  )
`summarise()` has grouped output by 'posteam'. You can override using the
`.groups` argument.
quantile_threshold_passer <- quantile(summary_data$n_pass, 0.75)
filtered_summary_data <- summary_data %>%
  filter(n_pass >= quantile_threshold_passer)

top_10_passers <- filtered_summary_data %>%
  arrange(desc(mean_epa)) %>% 
  head(n=10)
  
top_10_passers
# A tibble: 10 × 4
# Groups:   posteam [10]
   posteam passer       mean_epa n_pass
   <chr>   <chr>           <dbl>  <int>
 1 KC      P.Mahomes      0.286     880
 2 MIA     T.Tagovailoa   0.234     453
 3 SF      J.Garoppolo    0.200     348
 4 BUF     J.Allen        0.172     785
 5 DET     J.Goff         0.171     661
 6 CIN     J.Burrow       0.153     854
 7 DAL     D.Prescott     0.147     529
 8 PHI     J.Hurts        0.138     672
 9 JAX     T.Lawrence     0.128     764
10 CLE     J.Brissett     0.0912    445