Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
Uploaded answer key (script + graphics outputs).
  • Loading branch information
cxli233 authored Oct 31, 2024
1 parent a760333 commit 35e4cfb
Show file tree
Hide file tree
Showing 17 changed files with 1,499 additions and 0 deletions.
87 changes: 87 additions & 0 deletions Answer_key/01_Intro_to_R_answer_key.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: "Very basics of R coding ANSWER KEY"
author: "Chenxin Li"
date: "01/06/2023"
output:
html_notebook:
number_sections: yes
toc: yes
toc_float: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Packages
```{r}
library(dplyr)
```

# Excerice

Today you have learned some basic syntax of R.
Now it's time for you to practice.

## Q1:

1. Insert a new code chunk
2. Make this matrix

| 1 | 1 | 2 | 2 |
| 2 | 2 | 1 | 2 |
| 2 | 3 | 3 | 4 |
| 1 | 2 | 3 | 4 |

and save it as an item called `my_mat2`.
```{r}
my_mat2 <- rbind(
c(1, 1, 2, 2),
c(2, 2, 1, 2),
c(2, 3, 3, 4),
c(1, 2, 3, 4)
)
my_mat2
```


3. Select the 1st and 3rd rows and the 1st, 2nd and 4th columns, and save it as an item.
```{r}
item <- my_mat2[c(1,3), c(1,2,4)]
item
```

4. Take the square root for each member of my_mat2, then take log2(), and lastly find the maximum value.
Use the pipe syntax. The command for maximum is `max()`.

```{r}
my_mat2 %>%
sqrt() %>%
log2() %>%
max()
```

## Q2:

1. Use the following info to make a data frame and save it as an item called "grade".
Adel got 85 on the exam, Bren got 83, and Cecil got 93.
Their letter grades are B, B, and A, respectively.
(Hint: How many columns do you have to have?)

```{r}
grade <- data.frame(
name = c("Adel", "Bren", "Cecil"),
score = c(85, 83, 93),
letters = c("B", "B", "A")
)
grade
```

2. Pull out the column with the scores.
Use the `$` syntax.
```{r}
grade$score
```

157 changes: 157 additions & 0 deletions Answer_key/02_Intro_to_tidy_data_answer_key.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
title: "Data Arrangement ANSWER KEY"
author: "Chenxin Li"
date: "01/06/2023"
output:
html_notebook:
number_sections: yes
toc: yes
toc_float: yes

---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Load packages
```{r}
library(tidyverse)
library(readxl)
```
## Data from lecture
```{r}
child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols())
babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols())
```

These are two datasets downloaded from the [Gapminder foundation](https://www.gapminder.org/data/).
The Gapminder foundation has datasets on life expectancy, economy, education, and population across countries and years.
The goal is to remind us not only the "gaps" between developed and developing worlds, but also the amazing continuous improvements of quality of life through time.

1. Child mortality (0 - 5 year old) dying per 1000 born.
2. Births per woman.

These were recorded from year 1800 and projected all the way to 2100.

Let's look at them.

```{r}
head(child_mortality)
head(babies_per_woman)
```


```{r}
babies_per_woman_tidy <- babies_per_woman %>%
pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302))
head(babies_per_woman_tidy)
child_mortality_tidy <- child_mortality %>%
pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302))
head(child_mortality_tidy)
```

```{r}
birth_and_mortality <- babies_per_woman_tidy %>%
inner_join(child_mortality_tidy, by = c("country", "year"))
head(birth_and_mortality)
```

# Exercise

You have learned data arrangement! Let's do an exercise to practice what
you have learned today.
As the example, this time we will use income per person dataset from Gapminder foundation.

```{r}
income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols())
head(income)
```

## Tidy data
Is this a tidy data frame?
NO!

Make it a tidy data frame using this code chunk.
```{r}
income_tidy <- income %>%
pivot_longer(names_to = "year", values_to = "income", cols = !country)
head(income_tidy)
```

## Joining data

Combine the income data with birth per woman and child mortality data using this code chunk.
Name the new data frame "birth_and_mortality_and_income".

```{r}
birth_and_mortality_and_income <- income_tidy %>%
inner_join(babies_per_woman_tidy, by = c("country", "year")) %>%
inner_join(child_mortality_tidy, by = c("country", "year"))
head(birth_and_mortality_and_income)
```


## Filtering data

Filter out the data for Bangladesh and Sweden, in years 1945 (when WWII ended) and 2010.
Name the new data frame BS_1945_2010.
How has income, birth per woman and child mortality rate changed during this 55-year period?

```{r}
BS_1945_2010 <- birth_and_mortality_and_income %>%
filter(country == "Bangladesh" |
country == "Sweden") %>%
filter(year == 1945 |
year == 2010)
head(BS_1945_2010)
```


## Mutate data

Let's say for countries with income between 1000 to 10,000 dollars per year, they are called "fed".
For countries with income above 10,000 dollars per year, they are called "wealthy".
Below 1000, they are called "poor".

Using this info to make a new column called "status".
Hint: you will have to use case_when() and the "&" logic somewhere in this chunk.

```{r}
birth_and_mortality_and_income <- birth_and_mortality_and_income %>%
mutate(status = case_when(
income >= 1000 & income <= 10000 ~ "fed",
income > 10000 ~ "wealthy",
income < 1000 ~ "poor"
))
head(birth_and_mortality_and_income)
```

## Summarise the data

Let's look at the average child mortality and its sd in year 2010.
across countries across different status that we just defined.
Name the new data frame "child_mortality_summmary_2010".

```{r}
child_mortality_summary_2010 <- birth_and_mortality_and_income %>%
filter(year == 2010) %>%
group_by(status) %>%
summarize(
avg = mean(death_per_1000_born),
sd = sd(death_per_1000_born))
head(child_mortality_summary_2010)
```

How does child mortality compare across income group in year 2010?
Child mortality is higher for lower income groups.
83 changes: 83 additions & 0 deletions Answer_key/03_Intro_to_data_vis_answer_key.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Intro_to_data_vis ANSWER KEY"
author: "Chenxin Li"
date: "2023-01-06"
output:
html_notebook:
number_sections: yes
toc: yes
toc_float: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


# Required packages
```{r}
library(tidyverse)
library(RColorBrewer)
```

# Data from lecture
```{r}
child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols())
babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols())
income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols())
```

Re-shape data into tidy format.
```{r}
babies_per_woman_tidy <- babies_per_woman %>%
pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302))
child_mortality_tidy <- child_mortality %>%
pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302))
income_tidy <- income %>%
pivot_longer(names_to = "year", values_to = "income", cols = c(2:242))
```

Join them together.
```{r}
example2_data <- babies_per_woman_tidy %>%
inner_join(child_mortality_tidy, by = c("country", "year")) %>%
inner_join(income_tidy, by = c("country", "year"))
head(example2_data)
```

# Exercise
Graph income (in log10 scale) on x axis, child mortality on y axis, and color with children/woman in year 2010.
Were the trend similar to year 1945?
Save the graph using `ggsave()`.


```{r}
example2_data %>%
filter(year == 2010) %>%
ggplot(aes(x = log10(income), y = death_per_1000_born)) +
geom_point(aes(color = birth)) +
scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) +
labs(x = "log10 income",
y = "death per 1000 born",
title = "2010") +
theme_classic()
ggsave("Lesson3_answer1.png", width = 3, height = 3)
```
```{r}
example2_data %>%
filter(year == 1945) %>%
ggplot(aes(x = log10(income), y = death_per_1000_born)) +
geom_point(aes(color = birth)) +
scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) +
labs(x = "log10 income",
y = "death per 1000 born",
title = "1945") +
theme_classic()
ggsave("Lesson3_answer2.png", width = 3, height = 3)
```

Loading

0 comments on commit 35e4cfb

Please sign in to comment.