library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/36-SURE/36-SURE.github.io/main/data/nhl_shots.csv") nhl_shots
EDA project: NHL shooting
Overview
This project will be released on Thursday, June 6 and conclude with an 8-minute presentation on Tuesday, June 18 during lecture time.
Students will be randomly placed into groups of three and each group will be randomly assigned a dataset.
The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.
Deliverables
Each group is expected to make slides to accompany the 8-minute presentation.
The presentation should feature the following:
Overview of the structure of your dataset
Three questions/hypotheses you are interested in exploring
Three data visualizations exploring the questions, at least two of which must be multivariate. Each visualization must be in a different format from the other two, and you must have at least one categorical and one continuous visualization
One clustering analysis
Conclusions for the hypotheses based on your EDA and data visualizations
Timeline
There will be two submission deadlines:
Thursday, June 13 at 5pm ET - Each student will push their individual code for the project thus far to GitHub for review. We will then provide feedback.
Monday, June 17 at 5pm ET - Slides and full code must be completed and ready for presentation. Send your slides to Quang (quang@stat.cmu.edu
). All code must be written in R
; but the slides may be created in any software. Take advantage of examples from lectures, but also feel free to explore online resources that may be relevant. (But be sure to always consult the R
help documentation first before attempting to google around or ask ChatGPT.)
Data
This dataset contains all shot attempts from the 2023 NHL playoffs, courtesy of MoneyPuck.com.
Each row in the dataset corresponds to a shot attempt and the columns are:
shooterPlayerId
: player id of the skater taking the shotshooterName
: first and Last name of the player taking the shotteam
: team taking the shotshooterLeftRight
: whether the shooter is a left or right shotshooterTimeOnIce
: playing time in seconds that have passed since the shooter started their shiftshooterTimeOnIceSinceFaceoff
: minimum of the playing time in seconds since the last faceoff and the playing time that has passed since the shooter started their shiftevent
: whether the shot was a shot on goal (SHOT), goal, (GOAL), or missed the net (MISS)location
: the zone the shot took place inshotType
: type of shotshotAngle
: angle of the shot in degrees, positive if the shot is from the left side of the ice.shotAnglePlusRebound
: difference in angle between the previous shot and this shot if this shot is a rebound, is otherwise set to 0shotDistance
: distance from the net of the shot in feet, net is defined as being at the (89,0) coordinatesshotOnEmptyNet
: whether the shot was on an empty netshotRebound
: whether the shot is a rebound, i.e., if the last event was a shot and within 3 seconds of this shotshotRush
: whether the shot was on a rush, i.e., ff the last event was in another zone and within 4 secondsshotWasOnGoal
: whether the shot was on net - either a goal or a goalie saveshotGeneratedRebound
: whether the shot generated a rebound shot within 3 seconds of the this shotshotGoalieFroze
: whether the goalie froze the puck within 1 second of the shotarenaAdjustedShotDistance
: shot distance adjusted for arena recording bias - uses the same methodology as War On Ice proposed by Schuckers and CurroarenaAdjustedXCord
: x coordinate of the arena adjusted shot location, always a positive numberarenaAdjustedYCord
: y coordinate of the arena adjusted shot locationgoalieIdForShot
: player id for the goalie the shot is ongoalieNameForShot
: first and Last name of the goalie the shot is onteamCode
: team code of the shooting teamisHomeTeam
: whether the shooting team is the home teamhomeSkatersOnIce
: number of skaters on ice for the home team (does not count the goalie)awaySkatersOnIce
: number of skaters on ice for the away team (does not count the goalie)game_id
: game id of the game the shot took place inhomeTeamCode
: home team in the gameawayTeamCode
: away team in the gamehomeTeamGoals
: home team goals before the shot took placeawayTeamGoals
: away team goals before the shot took placetime
: seconds into the game of the shotperiod
: period of the game
Note that a full glossary of the features available for NHL shot data can be found here.
Starter code
In case you’re curious, the code to build this dataset can be found below. (Note that the data were originally downloaded from the MoneyPuck site.)
# download and unzip
# https://peter-tanner.com/moneypuck/downloads/shots_2022.zip
<- read_csv("shots_2022.csv") # might need to modify file path
nhl_shots
<- nhl_shots |>
nhl_shots filter(isPlayoffGame == 1) |>
select(# shooter info
shooterPlayerId, shooterName, team, shooterLeftRight,
shooterTimeOnIce, shooterTimeOnIceSinceFaceoff,# shot info
event, location, shotType, shotAngle, shotAnglePlusRebound,
shotDistance, shotOnEmptyNet, shotRebound, shotRush,
shotWasOnGoal, shotGeneratedRebound, shotGoalieFroze,# arena-adjusted locations
arenaAdjustedShotDistance, arenaAdjustedXCord, arenaAdjustedYCord,# goalie info
goalieIdForShot, goalieNameForShot,# team context
teamCode, isHomeTeam, homeSkatersOnIce, awaySkatersOnIce,# game context
game_id, homeTeamCode, awayTeamCode, homeTeamGoals, awayTeamGoals, time, period)