EDA project: UFA throwing


Overview

This project will be released on Thursday, June 5 and conclude with a 6-minute presentation on Tuesday, June 17 during lab.

Students will be randomly placed into groups of 2–3 and each group will be randomly assigned a dataset.

The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.

Deliverables

Each group is expected to make slides to accompany the 6-minute presentation. All group members must be present for the presentation and participate. No notes/scripts are allowed during the presentation.

The presentation should feature the following:

  • Overview of the structure of your dataset

  • Two questions/hypotheses you are interested in exploring

  • Two data visualizations exploring the questions—both must be multivariate (i.e., involving 2+ variables) and in different formats

  • One clustering analysis

  • Conclusions for the hypotheses based on your EDA and data visualizations

Timeline

There will be two submission deadlines:

Thursday, June 12 at 5pm ET - Each student will push their individual code for the project thus far to GitHub for review.

Tuesday, June 17 at 1pm ET - Slides and full code must be completed and ready for presentation. Send your slides to Quang (quang@stat.cmu.edu). All code must be written in R; but the slides may be created in any software. Take advantage of examples from lectures, but also feel free to explore online resources that may be relevant. (But be sure to always consult the R help documentation first before attempting to google around or ask ChatGPT.)

Data

This dataset contains throw attempts in the Ultimate Frisbee Association from 2021 to 2024. The Ultimate Frisbee Association (UFA), formerly known as the American Ultimate Disc League (AUDL), is the top professional ultimate league in North America. The data were used in the 2025 SSAC paper by Eberhard, Miller and Sandholz and made available on GitHub.

Each row in the dataset corresponds to a throw attempt and the columns are:

  • thrower: thrower name identifier
  • thrower_x: (standardized) x coordinate of thrower (position along the sideline direction; 0 represents the middle of the field, positive values denote the left side (with respect to facing the target end zone), and negative values denote the right side)
  • thrower_y: y coordinate of thrower (position along the end zone direction; higher y means closer to the target end zone)
  • receiver: receiver name identifier
  • receiver_x: (standardized) x coordinate of receiver (position along the sideline direction; 0 represents the middle of the field, positive values denote the left side (with respect to facing the target end zone), and negative values denote the right side)
  • receiver_y: y coordinate of receiver (position along the end zone direction; higher y means closer to the target end zone)
  • receiver: receiver name identifier
  • turnover: whether the throw results in a turnover
  • possession_num: possession number within a game
  • possession_throw: throw number within a possession
  • game_quarter: quarter within a game
  • is_home_team: whether the offense is the home team
  • home_team_score: home team score
  • away_team_score: away team score
  • gameID: unique identifier for game
  • home_teamID: home team name identifier
  • away_teamID: away team name identifier
  • times: game time remaining (seconds)
  • home_team_win: whether home team wins
  • score_diff: score difference between home and away teams
  • goal: whether the throw results in a goal
  • throw_distance: throw distance (yards)
  • x_diff: difference in x coordinates between thrower and receiver
  • y_diff: difference in y coordinates between thrower and receiver
  • throw_angle: throw angle (radians, 0 means forward, \(\pi\) means backward, positive values mean to the right, and negative values mean to the left of the thrower)

Starter code

library(tidyverse)
ufa_throws <- read_csv("https://raw.githubusercontent.com/36-SURE/2025/main/data/ufa_throws.csv")

In case you’re curious, the code to build this dataset can be found below.

ufa_throws <- read_csv("https://raw.githubusercontent.com/BradenEberhard/Expected-Throwing-Value/refs/heads/main/data/all_games_1024.csv")

ufa_throws <- ufa_throws |> 
  mutate(
    goal = ifelse(receiver_y > 100 & turnover == 0, 1, 0),
    throw_distance = sqrt((receiver_x - thrower_x) ^ 2 + (receiver_y - thrower_y) ^ 2),
    x_diff = receiver_x - thrower_x,
    y_diff = receiver_y - thrower_y,
    throw_angle = atan2(y_diff, x_diff)
  ) |> 
  select(-quarter_point, -total_points, -current_line)