Project Guidelines
Your Task
Each student has been allocated into a project group (of up to 4 students). Each group has been assigned a specific project research topic. Your goal is to complete the required project deliverables and checkpoints, in accordance with the guidelines detailed in the remainder of this document.
Deliverables
This project has the following three key deliverables DUE FRIDAY, JULY 19 AT 6PM ET.
(NOTE: It is important to finish everything by the July 19 deadline, due to the Minneapolis trip the week of July 22.)
1. Report
[template] (right click and choose “Save Link As…” to download)
DUE FRIDAY, JULY 19 AT 6PM ET
Your report should be written using Quarto and submitted as a rendered .html
file. We recommend using an IDMRaD (Introduction, Data, Methods, Results and Discussion) report format, with details provided in the report template.
2. Poster
[template] (Google Slides link)
DUE FRIDAY, JULY 19 AT 6PM ET
Your poster should be submitted as a .pdf
file. We will then make a printed copy for the poster session on the final day (July 26).
3. Slides
DUE FRIDAY, JULY 19 AT 6PM ET
Each group will give an 8-minute presentation on the final day (July 26). The presentation should effectively have the same structure as your report with an introduction, followed by data description, an overview of methods, followed by results and discussion. Your slides may be created in any software, but we ony accept submissions in the form of a .pdf
file, a Google Slides link, or a Quarto presentation (self-contained .html
file or hosted online).
Checkpoints
Checkpoint 1: 5-minute presentation during lab on July 1
Note: It is perfectly fine if you don’t have any results at this point
No notes/scripts are allowed
Your first checkpoint presentation should be structured as follows.
Introduction (1 slide): Describe your project topic/question(s) and why it is important
Data (1 slide): Data description, any preliminary data pre-processing/cleaning steps
EDA (2 slides max): 1–2 EDA plots related to your question(s) of interest
- Design the slides using the assertion-evidence model
Methods (1 slide): Early thoughts on methods/modeling strategy. Justify why it might be appropriate to answer your question(s) of interest
- Plan of action (1 slide): List all the steps needed to complete your project (be specific). Highlight the completed steps. What are the next steps?
Checkpoint 2: 8-minute presentation during lab on July 12
No notes/scripts are allowed
Your second checkpoint presentation should be structured as follows.
Introduction (1 slide): Describe your project topic/question(s) and why it is important
Data: (1 slide) Data description and any major data pre-processing/cleaning steps (e.g., whether you consider specific observations, create any meaningful features, etc. - but don’t mention minor steps like column type conversion, filtering out unnecessary rows)
Plan of action (1 slide): List all the steps needed to complete your project (be specific). Highlight the completed steps.
Present the completed steps (5 slides max): methods, plots, findings, etc.
- Design the slides using the assertion-evidence model
Plan of action (1 slide, use the same one as before): what are the steps still to be completed?
Project Topics
(Note: All five topics below contribute to an overall theme of “the impact of race, social and demographic factors on health, survival, and mortality.”)
Project 1: Premature deaths (TA: Princess)
Do income inequality, unemployment and high school completion rates affect the number of premature deaths of certain racial groups at the county level?
Project 2: Preventable hospital stays (TA: Nick)
Do income inequality, unemployment and high school completion rates affect the number of preventable hospital stays of certain racial groups at the county level?
Project 3: Mental health (TA: Nick)
Do the number of mental health professionals per county affect the number of poor mental health days?
Project 4: Drug overdose and alcohol related deaths (TA: Princess)
Are there demographic and social factors that are predictors of drug overdose and alcohol-related incidents (e.g., driving accidents)?
Project 5: Influences on childhood outcomes (TA: Akshay)
How are juvenile healthcare outcomes impacted by adult health-related practices (e.g., smoking, drinking, diet)?
Analysis
Your analysis should focus on both:
Exploratory data analysis: Create visualizations to explore the underlying structure of the data and gain insights about distributions and relationships between variables. These should be ideally based on reasoned hypotheses.
Statistical modeling: Demonstrate the use of statistical and machine learning modeling techniques. This may involve justifications for your choice of model (e.g., comparison with model specifications such as using different predictors, or with other methods), and then any relevant interpretation of the model with regards to your project’s topic. Depending on your project, the model(s) you rely on may be used for either an inference (i.e., interpreting coefficients) or prediction task. The model you choose just needs to be motivated by your question of interest.
Data
Required: County Health Rankings Data
The County Health Rankings Data—collected by the University of Wisconsin Population Health Institute—ranks every county in each state on their Health Outcomes and Health Factors.
This dataset also contains the measurements used to calculate the rankings for each county. More information can be found here.
You must (at minimum) use the 2024 ranking measures which can provide more insight into the most recent health ranking outcomes priorities. This can be used to better shape your project topic and related hypotheses.
Your analysis should mainly be done at the entire United States scale (as feasible). However, you are welcome to focus on some specific counties/states to test more granular spatial hypotheses.
Optional: additional suggestions and data sources
Consider doing a temporal or trend analysis for your analyses, as the County Health Rankings Data are typically collected over time.
- For predictive modeling, consider adding time-varying features or forecasting an outcome, with suitable uncertainty quantification.
Consider merging the County Health Rankings Data with other publicly available datasets.
- Example sources include US Census data (accessed via the
tidycensus
R
package) and COVID-19 data (can be accssed via thecovidcast
R
package).
- Example sources include US Census data (accessed via the
(Note: All data used must be publicly available.)