-
Notifications
You must be signed in to change notification settings - Fork 86
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c05e657
commit 9516f20
Showing
13 changed files
with
3,928 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: "Homework 7" | ||
output: | ||
html_document: | ||
theme: | ||
version: 4 | ||
--- | ||
|
||
```{r global_options, include=FALSE} | ||
library(knitr) | ||
library(tidyverse) | ||
library(broom) | ||
opts_chunk$set(fig.align="center", fig.height=4.326, fig.width=7) | ||
``` | ||
|
||
**This homework is due on Apr 11, 2024 at 11:00pm. Please submit as a pdf file on Canvas.** | ||
|
||
For both problems in this homework, we will work with the `heart_disease_data` dataset, which is a simplified and recoded version of a dataset available from kaggle. You can read about the original dataset here: https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease?resource=download | ||
|
||
The `heart_disease_data` dataset contains 9 variables: `HeartDisease` (whether or not the participant has heart disease), `BMI` (body mass index), `PhysicalHealth` (how many days a month was their physical health not good), `MentalHealth` (how many days a month was their mental health not good), `ApproximateAge` (participants age), `SleepTime` (how many hours of sleep do they get in a 24-hour period), `Smoking` (1-smoker, 0-nonsmoker), `AlcoholDrinking` (1-drinks alcohol, 0-does not drink), `PhysicalActivity` (1-did physical activity or exercise during the past 30 days, 0-hardly any physical activity). Compared to the original dataset, the columns `ApproximateAge`, `Smoking`, `AlcoholDrinking`, and `PhysicalActivity` have been converted into numeric columns so they can be included in a PCA. | ||
|
||
**Note:** This homework is about the contents of the plots. Don't worry about styling. It's OK to use the default theme and plot labeling. | ||
|
||
|
||
```{r message = FALSE} | ||
heart_data <- read_csv("https://wilkelab.org/SDS375/datasets/heart_disease_data.csv") | ||
``` | ||
|
||
**Problem 1: (10 pts)** | ||
|
||
Perform a PCA of the numerical colums of the `heart_disease_data` dataset. Then make two plots, a rotation plot of components 1 and 2 and a plot of the eigenvalues, showing the amount of variance explained by the various components. | ||
|
||
```{r} | ||
# your code here | ||
``` | ||
|
||
```{r} | ||
# your code here | ||
``` | ||
|
||
|
||
**Problem 2: (10 pts)** Make a scatter plot of PC 2 versus PC 1 and color by heart disease status. Then use the rotation plot from Problem 1 to describe the variables/factors by which we can separate the study participants with heart disease from the study participants without heart disease. | ||
|
||
|
||
```{r} | ||
# your code here | ||
``` |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: "Project 3" | ||
output: | ||
html_document: | ||
theme: | ||
version: 4 | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
library(tidyverse) | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
|
||
In this project, you will be working with a dataset of your own choosing. **Important:** The dataset needs to be picked from the [TidyTuesday project,](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023), and it needs to be one that has been released between May 30, 2023 and December 26, 2023 (both dates inclusive). | ||
|
||
**Hints:** | ||
|
||
- Read in your data with `readr::read_csv()`, as we have done in prior projects. **Do not use the tidytuesdayR package.** The TidyTuesday site explains for each dataset how it can be read with `readr::read_csv()`, under "Get the data here", part "Or read in the data manually". | ||
|
||
- Make sure your question is actually a question, and not a veiled instruction to perform a particular analysis. | ||
|
||
- Adjust `fig.width` and `fig.height` in the chunk headers to customize figure sizing and figure aspect ratios. These numbers are measured in inches and will usually fall between 4 and 10. | ||
|
||
You can delete these instructions from your project. Please also delete text such as *Your approach here* or `# Code for figure 1 here`. | ||
|
||
**Introduction:** *Your introduction here.* | ||
|
||
**Question:** *Your question here.* | ||
|
||
**Approach:** *Your approach here.* | ||
|
||
**Analysis:** | ||
|
||
```{r} | ||
# Data loading/wrangling/analysis code here | ||
``` | ||
|
||
```{r fig.width = 5, fig.height = 5} | ||
# Code for figure 1 here | ||
``` | ||
|
||
```{r fig.width = 5, fig.height = 5} | ||
# Code for figure 2 here | ||
``` | ||
|
||
**Discussion:** *Your discussion of results here.* |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: "Homework 7" | ||
output: | ||
html_document: | ||
theme: | ||
version: 4 | ||
--- | ||
|
||
```{r global_options, include=FALSE} | ||
library(knitr) | ||
library(tidyverse) | ||
library(broom) | ||
opts_chunk$set(fig.align="center", fig.height=4.326, fig.width=7) | ||
``` | ||
|
||
**This homework is due on Apr 11, 2024 at 11:00pm. Please submit as a pdf file on Canvas.** | ||
|
||
For both problems in this homework, we will work with the `heart_disease_data` dataset, which is a simplified and recoded version of a dataset available from kaggle. You can read about the original dataset here: https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease?resource=download | ||
|
||
The `heart_disease_data` dataset contains 9 variables: `HeartDisease` (whether or not the participant has heart disease), `BMI` (body mass index), `PhysicalHealth` (how many days a month was their physical health not good), `MentalHealth` (how many days a month was their mental health not good), `ApproximateAge` (participants age), `SleepTime` (how many hours of sleep do they get in a 24-hour period), `Smoking` (1-smoker, 0-nonsmoker), `AlcoholDrinking` (1-drinks alcohol, 0-does not drink), `PhysicalActivity` (1-did physical activity or exercise during the past 30 days, 0-hardly any physical activity). Compared to the original dataset, the columns `ApproximateAge`, `Smoking`, `AlcoholDrinking`, and `PhysicalActivity` have been converted into numeric columns so they can be included in a PCA. | ||
|
||
**Note:** This homework is about the contents of the plots. Don't worry about styling. It's OK to use the default theme and plot labeling. | ||
|
||
|
||
```{r message = FALSE} | ||
heart_data <- read_csv("https://wilkelab.org/SDS375/datasets/heart_disease_data.csv") | ||
``` | ||
|
||
**Problem 1: (10 pts)** | ||
|
||
Perform a PCA of the numerical colums of the `heart_disease_data` dataset. Then make two plots, a rotation plot of components 1 and 2 and a plot of the eigenvalues, showing the amount of variance explained by the various components. | ||
|
||
```{r} | ||
# your code here | ||
``` | ||
|
||
```{r} | ||
# your code here | ||
``` | ||
|
||
|
||
**Problem 2: (10 pts)** Make a scatter plot of PC 2 versus PC 1 and color by heart disease status. Then use the rotation plot from Problem 1 to describe the variables/factors by which we can separate the study participants with heart disease from the study participants without heart disease. | ||
|
||
|
||
```{r} | ||
# your code here | ||
``` |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: "Project 3" | ||
output: | ||
html_document: | ||
theme: | ||
version: 4 | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
library(tidyverse) | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
|
||
In this project, you will be working with a dataset of your own choosing. **Important:** The dataset needs to be picked from the [TidyTuesday project,](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023), and it needs to be one that has been released between May 30, 2023 and December 26, 2023 (both dates inclusive). | ||
|
||
**Hints:** | ||
|
||
- Read in your data with `readr::read_csv()`, as we have done in prior projects. **Do not use the tidytuesdayR package.** The TidyTuesday site explains for each dataset how it can be read with `readr::read_csv()`, under "Get the data here", part "Or read in the data manually". | ||
|
||
- Make sure your question is actually a question, and not a veiled instruction to perform a particular analysis. | ||
|
||
- Adjust `fig.width` and `fig.height` in the chunk headers to customize figure sizing and figure aspect ratios. These numbers are measured in inches and will usually fall between 4 and 10. | ||
|
||
You can delete these instructions from your project. Please also delete text such as *Your approach here* or `# Code for figure 1 here`. | ||
|
||
**Introduction:** *Your introduction here.* | ||
|
||
**Question:** *Your question here.* | ||
|
||
**Approach:** *Your approach here.* | ||
|
||
**Analysis:** | ||
|
||
```{r} | ||
# Data loading/wrangling/analysis code here | ||
``` | ||
|
||
```{r fig.width = 5, fig.height = 5} | ||
# Code for figure 1 here | ||
``` | ||
|
||
```{r fig.width = 5, fig.height = 5} | ||
# Code for figure 2 here | ||
``` | ||
|
||
**Discussion:** *Your discussion of results here.* |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.