library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/36-SURE/2025/main/data/wwc_passes.csv") wwc_passes
EDA project: Women’s World Cup passing
Overview
This project will be released on Thursday, June 5 and conclude with a 6-minute presentation on Tuesday, June 17 during lab.
Students will be randomly placed into groups of 2–3 and each group will be randomly assigned a dataset.
The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.
Deliverables
Each group is expected to make slides to accompany the 6-minute presentation. All group members must be present for the presentation and participate. No notes/scripts are allowed during the presentation.
The presentation should feature the following:
Overview of the structure of your dataset
Two questions/hypotheses you are interested in exploring
Two data visualizations exploring the questions—both must be multivariate (i.e., involving 2+ variables) and in different formats
One clustering analysis
Conclusions for the hypotheses based on your EDA and data visualizations
Timeline
There will be two submission deadlines:
Thursday, June 12 at 5pm ET - Each student will push their individual code for the project thus far to GitHub for review.
Tuesday, June 17 at 1pm ET - Slides and full code must be completed and ready for presentation. Send your slides to Quang (quang@stat.cmu.edu
). All code must be written in R
; but the slides may be created in any software. Take advantage of examples from lectures, but also feel free to explore online resources that may be relevant. (But be sure to always consult the R
help documentation first before attempting to google around or ask ChatGPT.)
Data
This dataset contains all (completed) pass attempts from the 2023 FIFA Women’s World Cup, courtesy of StatsBomb and accessed via the StatsBombR
package.
Each row in the dataset corresponds to a pass attempt and the columns are:
period
: period of the game (1
: first half,2
: second half,3
: first extra time period,4
: second extra time period)
minute
&second
: minute & second of the gameduration
: duration of the pass (seconds)
under_pressure
: whether the pass is under pressure
counterpress
: whether the pass is under a counterpress event (defined as a pressing action occurring within five seconds of an open play turnover)
play_pattern.name
: play patternteam.name
: team nameplayer.name
: passer name
position.name
: passer positionpass.{switch, aerial_won, cross, through_ball, ..., no_touch, deflected, miscommunication}
: different pass binary indicators (should be self-explanatory,NA
meansFALSE
)pass.recipient.name
: pass recipient namepass.height.name
: type of pass heightpass.body_part.name
: body part used to make the passpass.type.name
: type of passpass.outcome.name
: outcome of the passpass.technique.name
: technique of the passlocation.{x, y}
: (x
,y
) location when pass takes place. (Note: all locations have been standardized so that the possession team is going from left to right. Higherx
values means closer to the opposing team’s goal, and highery
values means more toward the left wing)pass.end_location.{x, y}
: (x
,y
) coordinates of pass ending locationpass.length
: length of the passpass.angle
: angle of the passTimeInPoss
: time elapsed since the start of possession
TimeToPossEnd
: time remaining until end of possession
Starter code
In case you’re curious, the code to build this dataset can be found below.
library(tidyverse)
library(StatsBombR)
<- FreeCompetitions() |>
wwc filter(competition_id == "72" & season_id == "107") |>
FreeMatches() |>
free_allevents() |>
allclean()
<- wwc |>
wwc_passes filter(type.name == "Pass", !is.na(pass.recipient.name)) |>
::remove_empty() |>
janitorselect(period, minute, second, duration, under_pressure, counterpress,
play_pattern.name, team.name, player.name, position.name, :pass.through_ball, pass.shot_assist:pass.goal_assist,
pass.switch:pass.miscommunication, pass.recipient.name, pass.height.name,
pass.cut_back
pass.body_part.name, pass.type.name, pass.outcome.name, pass.technique.name,
location.x, location.y, pass.end_location.x, pass.end_location.y, pass.length, pass.angle, TimeInPoss, TimeToPossEnd)