Skip to content

Commit

Permalink
HW 7 and Project 3
Browse files Browse the repository at this point in the history
  • Loading branch information
clauswilke committed Apr 1, 2024
1 parent c05e657 commit 9516f20
Show file tree
Hide file tree
Showing 13 changed files with 3,928 additions and 5 deletions.
47 changes: 47 additions & 0 deletions assignments/HW7.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "Homework 7"
output:
html_document:
theme:
version: 4
---

```{r global_options, include=FALSE}
library(knitr)
library(tidyverse)
library(broom)
opts_chunk$set(fig.align="center", fig.height=4.326, fig.width=7)
```

**This homework is due on Apr 11, 2024 at 11:00pm. Please submit as a pdf file on Canvas.**

For both problems in this homework, we will work with the `heart_disease_data` dataset, which is a simplified and recoded version of a dataset available from kaggle. You can read about the original dataset here: https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease?resource=download

The `heart_disease_data` dataset contains 9 variables: `HeartDisease` (whether or not the participant has heart disease), `BMI` (body mass index), `PhysicalHealth` (how many days a month was their physical health not good), `MentalHealth` (how many days a month was their mental health not good), `ApproximateAge` (participants age), `SleepTime` (how many hours of sleep do they get in a 24-hour period), `Smoking` (1-smoker, 0-nonsmoker), `AlcoholDrinking` (1-drinks alcohol, 0-does not drink), `PhysicalActivity` (1-did physical activity or exercise during the past 30 days, 0-hardly any physical activity). Compared to the original dataset, the columns `ApproximateAge`, `Smoking`, `AlcoholDrinking`, and `PhysicalActivity` have been converted into numeric columns so they can be included in a PCA.

**Note:** This homework is about the contents of the plots. Don't worry about styling. It's OK to use the default theme and plot labeling.


```{r message = FALSE}
heart_data <- read_csv("https://wilkelab.org/SDS375/datasets/heart_disease_data.csv")
```

**Problem 1: (10 pts)**

Perform a PCA of the numerical colums of the `heart_disease_data` dataset. Then make two plots, a rotation plot of components 1 and 2 and a plot of the eigenvalues, showing the amount of variance explained by the various components.

```{r}
# your code here
```

```{r}
# your code here
```


**Problem 2: (10 pts)** Make a scatter plot of PC 2 versus PC 1 and color by heart disease status. Then use the rotation plot from Problem 1 to describe the variables/factors by which we can separate the study participants with heart disease from the study participants without heart disease.


```{r}
# your code here
```
682 changes: 682 additions & 0 deletions assignments/HW7.html

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions assignments/Project_3.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "Project 3"
output:
html_document:
theme:
version: 4
---

```{r setup, include=FALSE}
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE)
```


In this project, you will be working with a dataset of your own choosing. **Important:** The dataset needs to be picked from the [TidyTuesday project,](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023), and it needs to be one that has been released between May 30, 2023 and December 26, 2023 (both dates inclusive).

**Hints:**

- Read in your data with `readr::read_csv()`, as we have done in prior projects. **Do not use the tidytuesdayR package.** The TidyTuesday site explains for each dataset how it can be read with `readr::read_csv()`, under "Get the data here", part "Or read in the data manually".

- Make sure your question is actually a question, and not a veiled instruction to perform a particular analysis.

- Adjust `fig.width` and `fig.height` in the chunk headers to customize figure sizing and figure aspect ratios. These numbers are measured in inches and will usually fall between 4 and 10.

You can delete these instructions from your project. Please also delete text such as *Your approach here* or `# Code for figure 1 here`.

**Introduction:** *Your introduction here.*

**Question:** *Your question here.*

**Approach:** *Your approach here.*

**Analysis:**

```{r}
# Data loading/wrangling/analysis code here
```

```{r fig.width = 5, fig.height = 5}
# Code for figure 1 here
```

```{r fig.width = 5, fig.height = 5}
# Code for figure 2 here
```

**Discussion:** *Your discussion of results here.*
676 changes: 676 additions & 0 deletions assignments/Project_3.html

Large diffs are not rendered by default.

481 changes: 481 additions & 0 deletions assignments/Project_3_instructions.html

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions docs/assignments/HW7.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "Homework 7"
output:
html_document:
theme:
version: 4
---

```{r global_options, include=FALSE}
library(knitr)
library(tidyverse)
library(broom)
opts_chunk$set(fig.align="center", fig.height=4.326, fig.width=7)
```

**This homework is due on Apr 11, 2024 at 11:00pm. Please submit as a pdf file on Canvas.**

For both problems in this homework, we will work with the `heart_disease_data` dataset, which is a simplified and recoded version of a dataset available from kaggle. You can read about the original dataset here: https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease?resource=download

The `heart_disease_data` dataset contains 9 variables: `HeartDisease` (whether or not the participant has heart disease), `BMI` (body mass index), `PhysicalHealth` (how many days a month was their physical health not good), `MentalHealth` (how many days a month was their mental health not good), `ApproximateAge` (participants age), `SleepTime` (how many hours of sleep do they get in a 24-hour period), `Smoking` (1-smoker, 0-nonsmoker), `AlcoholDrinking` (1-drinks alcohol, 0-does not drink), `PhysicalActivity` (1-did physical activity or exercise during the past 30 days, 0-hardly any physical activity). Compared to the original dataset, the columns `ApproximateAge`, `Smoking`, `AlcoholDrinking`, and `PhysicalActivity` have been converted into numeric columns so they can be included in a PCA.

**Note:** This homework is about the contents of the plots. Don't worry about styling. It's OK to use the default theme and plot labeling.


```{r message = FALSE}
heart_data <- read_csv("https://wilkelab.org/SDS375/datasets/heart_disease_data.csv")
```

**Problem 1: (10 pts)**

Perform a PCA of the numerical colums of the `heart_disease_data` dataset. Then make two plots, a rotation plot of components 1 and 2 and a plot of the eigenvalues, showing the amount of variance explained by the various components.

```{r}
# your code here
```

```{r}
# your code here
```


**Problem 2: (10 pts)** Make a scatter plot of PC 2 versus PC 1 and color by heart disease status. Then use the rotation plot from Problem 1 to describe the variables/factors by which we can separate the study participants with heart disease from the study participants without heart disease.


```{r}
# your code here
```
682 changes: 682 additions & 0 deletions docs/assignments/HW7.html

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions docs/assignments/Project_3.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "Project 3"
output:
html_document:
theme:
version: 4
---

```{r setup, include=FALSE}
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE)
```


In this project, you will be working with a dataset of your own choosing. **Important:** The dataset needs to be picked from the [TidyTuesday project,](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023), and it needs to be one that has been released between May 30, 2023 and December 26, 2023 (both dates inclusive).

**Hints:**

- Read in your data with `readr::read_csv()`, as we have done in prior projects. **Do not use the tidytuesdayR package.** The TidyTuesday site explains for each dataset how it can be read with `readr::read_csv()`, under "Get the data here", part "Or read in the data manually".

- Make sure your question is actually a question, and not a veiled instruction to perform a particular analysis.

- Adjust `fig.width` and `fig.height` in the chunk headers to customize figure sizing and figure aspect ratios. These numbers are measured in inches and will usually fall between 4 and 10.

You can delete these instructions from your project. Please also delete text such as *Your approach here* or `# Code for figure 1 here`.

**Introduction:** *Your introduction here.*

**Question:** *Your question here.*

**Approach:** *Your approach here.*

**Analysis:**

```{r}
# Data loading/wrangling/analysis code here
```

```{r fig.width = 5, fig.height = 5}
# Code for figure 1 here
```

```{r fig.width = 5, fig.height = 5}
# Code for figure 2 here
```

**Discussion:** *Your discussion of results here.*
676 changes: 676 additions & 0 deletions docs/assignments/Project_3.html

Large diffs are not rendered by default.

481 changes: 481 additions & 0 deletions docs/assignments/Project_3_instructions.html

Large diffs are not rendered by default.

33 changes: 33 additions & 0 deletions docs/schedule.html
Original file line number Diff line number Diff line change
Expand Up @@ -2741,6 +2741,24 @@ <h3 id="mar-28-2024dimension-reduction-2">20. Mar 28, 2024—Dimension reduction
</li>
<li><a href="worksheets/dimension-reduction-2.Rmd">Worksheet</a></li>
</ul>
<h3 id="apr-2-2024clustering">21. Apr 2, 2024—Clustering</h3>
<p class="nospace">
Materials:
</p>
<ul>
<li><a href="slides/clustering.html">Slides</a><br />
</li>
<li><a href="worksheets/clustering.Rmd">Worksheet</a></li>
</ul>
<h3 id="apr-4-2024hierarchical-clustering">22. Apr 4, 2024—Hierarchical clustering</h3>
<p class="nospace">
Materials:
</p>
<ul>
<li><a href="slides/hierarchical-clustering.html">Slides</a><br />
</li>
<li><a href="worksheets/hierarchical-clustering.Rmd">Worksheet</a></li>
</ul>
<h2 id="homeworks">Homeworks</h2>
<p>All homeworks are due by 11:00pm on the day they are due. Homeworks need to be submitted as pdf files on Canvas.</p>
<h3 id="homework-1-due-jan-25-2024">Homework 1 (due Jan 25, 2024)</h3>
Expand Down Expand Up @@ -2792,6 +2810,13 @@ <h3 id="homework-6-due-apr-4-2024">Homework 6 (due Apr 4, 2024)</h3>
<li><a href="assignments/HW6.html">HTML</a></li>
</ul>
<h3 id="homework-7-due-apr-11-2024">Homework 7 (due Apr 11, 2024)</h3>
<p class="nospace">
Materials:
</p>
<ul>
<li><a href="assignments/HW7.Rmd">R Markdown template</a></li>
<li><a href="assignments/HW7.html">HTML</a></li>
</ul>
<h2 id="projects">Projects</h2>
<p>All projects are due by 11:00pm on the day they are due. Projects need to be submitted on Canvas. Please carefully read the submission instructions for each project.</p>
<h3 id="project-1-due-feb-15-2024">Project 1 (due Feb 15, 2024)</h3>
Expand All @@ -2817,6 +2842,14 @@ <h3 id="project-2-due-mar-21-2024">Project 2 (due Mar 21, 2024)</h3>
</ul>
<p>Please use the example and the solutions from Project 1 as examples for Project 2.</p>
<h3 id="project-3-due-apr-18-2024">Project 3 (due Apr 18, 2024)</h3>
<p class="nospace">
Materials:
</p>
<ul>
<li><a href="assignments/Project_3_instructions.html">Instructions</a></li>
<li><a href="assignments/Project_3.Rmd">Project Template (Rmd)</a></li>
<li><a href="assignments/Project_3.html">Project Template (HTML)</a></li>
</ul>
<h2 class="appendix" id="reuse">Reuse</h2>
<p>Text and figures are licensed under Creative Commons Attribution <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a>. Any computer code (R, HTML, CSS, etc.) in slides and worksheets, including in slide and worksheet sources, is also licensed under <a href="https://github.com/wilkelab/SDS375/LICENSE.md">MIT</a>. Note that figures in slides may be pulled in from external sources and may be licensed under different terms. For such images, image credits are available in the slide notes, accessible via pressing the letter ‘p’.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r distill-force-highlighting-css"><code class="sourceCode r"></code></pre></div>
Expand Down
Loading

0 comments on commit 9516f20

Please sign in to comment.