library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/36-SURE/2025/main/data/ufa_throws.csv") ufa_throws
EDA project: UFA throwing
Overview
This project will be released on Thursday, June 5 and conclude with a 6-minute presentation on Tuesday, June 17 during lab.
Students will be randomly placed into groups of 2–3 and each group will be randomly assigned a dataset.
The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.
Deliverables
Each group is expected to make slides to accompany the 6-minute presentation. All group members must be present for the presentation and participate. No notes/scripts are allowed during the presentation.
The presentation should feature the following:
Overview of the structure of your dataset
Two questions/hypotheses you are interested in exploring
Two data visualizations exploring the questions—both must be multivariate (i.e., involving 2+ variables) and in different formats
One clustering analysis
Conclusions for the hypotheses based on your EDA and data visualizations
Timeline
There will be two submission deadlines:
Thursday, June 12 at 5pm ET - Each student will push their individual code for the project thus far to GitHub for review.
Tuesday, June 17 at 1pm ET - Slides and full code must be completed and ready for presentation. Send your slides to Quang (quang@stat.cmu.edu
). All code must be written in R
; but the slides may be created in any software. Take advantage of examples from lectures, but also feel free to explore online resources that may be relevant. (But be sure to always consult the R
help documentation first before attempting to google around or ask ChatGPT.)
Data
This dataset contains throw attempts in the Ultimate Frisbee Association from 2021 to 2024. The Ultimate Frisbee Association (UFA), formerly known as the American Ultimate Disc League (AUDL), is the top professional ultimate league in North America. The data were used in the 2025 SSAC paper by Eberhard, Miller and Sandholz and made available on GitHub.
Each row in the dataset corresponds to a throw attempt and the columns are:
thrower
: thrower name identifierthrower_x
: (standardized) x coordinate of thrower (position along the sideline direction; 0 represents the middle of the field, positive values denote the left side (with respect to facing the target end zone), and negative values denote the right side)thrower_y
: y coordinate of thrower (position along the end zone direction; highery
means closer to the target end zone)receiver
: receiver name identifierreceiver_x
: (standardized) x coordinate of receiver (position along the sideline direction; 0 represents the middle of the field, positive values denote the left side (with respect to facing the target end zone), and negative values denote the right side)receiver_y
: y coordinate of receiver (position along the end zone direction; highery
means closer to the target end zone)receiver
: receiver name identifierturnover
: whether the throw results in a turnoverpossession_num
: possession number within a gamepossession_throw
: throw number within a possessiongame_quarter
: quarter within a gameis_home_team
: whether the offense is the home teamhome_team_score
: home team scoreaway_team_score
: away team scoregameID
: unique identifier for gamehome_teamID
: home team name identifieraway_teamID
: away team name identifiertimes
: game time remaining (seconds)home_team_win
: whether home team winsscore_diff
: score difference between home and away teamsgoal
: whether the throw results in a goalthrow_distance
: throw distance (yards)x_diff
: difference in x coordinates between thrower and receivery_diff
: difference in y coordinates between thrower and receiver
throw_angle
: throw angle (radians, 0 means forward, \(\pi\) means backward, positive values mean to the right, and negative values mean to the left of the thrower)
Starter code
In case you’re curious, the code to build this dataset can be found below.
<- read_csv("https://raw.githubusercontent.com/BradenEberhard/Expected-Throwing-Value/refs/heads/main/data/all_games_1024.csv")
ufa_throws
<- ufa_throws |>
ufa_throws mutate(
goal = ifelse(receiver_y > 100 & turnover == 0, 1, 0),
throw_distance = sqrt((receiver_x - thrower_x) ^ 2 + (receiver_y - thrower_y) ^ 2),
x_diff = receiver_x - thrower_x,
y_diff = receiver_y - thrower_y,
throw_angle = atan2(y_diff, x_diff)
|>
) select(-quarter_point, -total_points, -current_line)