library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/36-SURE/36-SURE.github.io/main/data/covid_hospitalizations.csv") covid_hospitalizations
EDA project: COVID hospitalizations in Pennsylvania
Overview
This project will be released on Thursday, June 6 and conclude with an 8-minute presentation on Tuesday, June 18 during lecture time.
Students will be randomly placed into groups of three and each group will be randomly assigned a dataset.
The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.
Deliverables
Each group is expected to make slides to accompany the 8-minute presentation.
The presentation should feature the following:
Overview of the structure of your dataset
Three questions/hypotheses you are interested in exploring
Three data visualizations exploring the questions, at least two of which must be multivariate. Each visualization must be in a different format from the other two, and you must have at least one categorical and one continuous visualization
One clustering analysis
Conclusions for the hypotheses based on your EDA and data visualizations
Timeline
There will be two submission deadlines:
Thursday, June 13 at 5pm ET - Each student will push their individual code for the project thus far to GitHub for review. We will then provide feedback.
Monday, June 17 at 5pm ET - Slides and full code must be completed and ready for presentation. Send your slides to Quang (quang@stat.cmu.edu
). All code must be written in R
; but the slides may be created in any software. Take advantage of examples from lectures, but also feel free to explore online resources that may be relevant. (But be sure to always consult the R
help documentation first before attempting to google around or ask ChatGPT.)
Data
This dataset contains county-level hospitalization information on related to COVID-19 patient in Pennsylvania. with the dates ranging from April 1, 2020 to December 31, 2020. The data are available online at the Open Data Pennsylvania website. More information about the hospitalization data can be found here.
Each row in the dataset corresponds to a county in Pennsylvania on a given date (between April 1, 2020 and December 31, 2020 - note that missing data are present for some of the rows) and the columns are:
county
: name of countydate
: date
icu_avail
: adult ICU beds available
icu_total
: adult ICU beds total
med_avail
: medical/surgical beds available
med_total
: medical/surgical beds totalped_avail
: pediatrics beds available
ped_total
: pediatrics beds total
pic_avail
: pediatrics ICU beds available
pic_total
: pediatrics ICU beds total
covid_patients
: COVID-19 patients hospitalizedcovid_vents
: COVID-19 patients on ventilators
vents_use
: total ventilators in use
vents
: total ventilators
icu_avail_mean
: adult ICU beds available, 14-day averageicu_total_mean
: adult ICU beds total, 14-day average
med_avail_mean
: medical/surgical beds available, 14-day averagemed_total_mean
: medical/surgical beds total, 14-day averageped_avail_mean
: pediatric beds available, 14-day averageped_total_mean
: pediatric beds total, 14-day averagepic_avail_mean
: pediatric ICU beds available, 14-day averagepic_total_mean
: pediatric ICU beds total, 14-day averagecovid_patients_mean
: COVID-19 patients hospitalized, 14-day averagecovid_vents_mean
: COVID-19 patients on ventilators, 14-day averagevents_use_mean
: total ventilators in use, 14-day averagevents_mean
: total ventilators, 14-day average
icu_percent
: adult ICU beds, percent available
med_percent
: medical/surgical beds, percent available
ped_percent
: pediatric beds, percent availablepic_percent
: pediatric ICU beds, percent available
covid_icu
: COVID patients in intensive care unit (ICU)covid_icu_mean
: the mean for COVID patients in intensive care unit (ICU), 14-day average.
county_fips
: a county’s 5-digit code (read more here)longitude
: a longitude generic point within the countylatitude
: a latitude generic point within the county