-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Uploaded answer key (script + graphics outputs).
- Loading branch information
Showing
17 changed files
with
1,499 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
title: "Very basics of R coding ANSWER KEY" | ||
author: "Chenxin Li" | ||
date: "01/06/2023" | ||
output: | ||
html_notebook: | ||
number_sections: yes | ||
toc: yes | ||
toc_float: yes | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
# Packages | ||
```{r} | ||
library(dplyr) | ||
``` | ||
|
||
# Excerice | ||
|
||
Today you have learned some basic syntax of R. | ||
Now it's time for you to practice. | ||
|
||
## Q1: | ||
|
||
1. Insert a new code chunk | ||
2. Make this matrix | ||
|
||
| 1 | 1 | 2 | 2 | | ||
| 2 | 2 | 1 | 2 | | ||
| 2 | 3 | 3 | 4 | | ||
| 1 | 2 | 3 | 4 | | ||
|
||
and save it as an item called `my_mat2`. | ||
```{r} | ||
my_mat2 <- rbind( | ||
c(1, 1, 2, 2), | ||
c(2, 2, 1, 2), | ||
c(2, 3, 3, 4), | ||
c(1, 2, 3, 4) | ||
) | ||
my_mat2 | ||
``` | ||
|
||
|
||
3. Select the 1st and 3rd rows and the 1st, 2nd and 4th columns, and save it as an item. | ||
```{r} | ||
item <- my_mat2[c(1,3), c(1,2,4)] | ||
item | ||
``` | ||
|
||
4. Take the square root for each member of my_mat2, then take log2(), and lastly find the maximum value. | ||
Use the pipe syntax. The command for maximum is `max()`. | ||
|
||
```{r} | ||
my_mat2 %>% | ||
sqrt() %>% | ||
log2() %>% | ||
max() | ||
``` | ||
|
||
## Q2: | ||
|
||
1. Use the following info to make a data frame and save it as an item called "grade". | ||
Adel got 85 on the exam, Bren got 83, and Cecil got 93. | ||
Their letter grades are B, B, and A, respectively. | ||
(Hint: How many columns do you have to have?) | ||
|
||
```{r} | ||
grade <- data.frame( | ||
name = c("Adel", "Bren", "Cecil"), | ||
score = c(85, 83, 93), | ||
letters = c("B", "B", "A") | ||
) | ||
grade | ||
``` | ||
|
||
2. Pull out the column with the scores. | ||
Use the `$` syntax. | ||
```{r} | ||
grade$score | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
--- | ||
title: "Data Arrangement ANSWER KEY" | ||
author: "Chenxin Li" | ||
date: "01/06/2023" | ||
output: | ||
html_notebook: | ||
number_sections: yes | ||
toc: yes | ||
toc_float: yes | ||
|
||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
# Load packages | ||
```{r} | ||
library(tidyverse) | ||
library(readxl) | ||
``` | ||
## Data from lecture | ||
```{r} | ||
child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols()) | ||
babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols()) | ||
``` | ||
|
||
These are two datasets downloaded from the [Gapminder foundation](https://www.gapminder.org/data/). | ||
The Gapminder foundation has datasets on life expectancy, economy, education, and population across countries and years. | ||
The goal is to remind us not only the "gaps" between developed and developing worlds, but also the amazing continuous improvements of quality of life through time. | ||
|
||
1. Child mortality (0 - 5 year old) dying per 1000 born. | ||
2. Births per woman. | ||
|
||
These were recorded from year 1800 and projected all the way to 2100. | ||
|
||
Let's look at them. | ||
|
||
```{r} | ||
head(child_mortality) | ||
head(babies_per_woman) | ||
``` | ||
|
||
|
||
```{r} | ||
babies_per_woman_tidy <- babies_per_woman %>% | ||
pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302)) | ||
head(babies_per_woman_tidy) | ||
child_mortality_tidy <- child_mortality %>% | ||
pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302)) | ||
head(child_mortality_tidy) | ||
``` | ||
|
||
```{r} | ||
birth_and_mortality <- babies_per_woman_tidy %>% | ||
inner_join(child_mortality_tidy, by = c("country", "year")) | ||
head(birth_and_mortality) | ||
``` | ||
|
||
# Exercise | ||
|
||
You have learned data arrangement! Let's do an exercise to practice what | ||
you have learned today. | ||
As the example, this time we will use income per person dataset from Gapminder foundation. | ||
|
||
```{r} | ||
income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols()) | ||
head(income) | ||
``` | ||
|
||
## Tidy data | ||
Is this a tidy data frame? | ||
NO! | ||
|
||
Make it a tidy data frame using this code chunk. | ||
```{r} | ||
income_tidy <- income %>% | ||
pivot_longer(names_to = "year", values_to = "income", cols = !country) | ||
head(income_tidy) | ||
``` | ||
|
||
## Joining data | ||
|
||
Combine the income data with birth per woman and child mortality data using this code chunk. | ||
Name the new data frame "birth_and_mortality_and_income". | ||
|
||
```{r} | ||
birth_and_mortality_and_income <- income_tidy %>% | ||
inner_join(babies_per_woman_tidy, by = c("country", "year")) %>% | ||
inner_join(child_mortality_tidy, by = c("country", "year")) | ||
head(birth_and_mortality_and_income) | ||
``` | ||
|
||
|
||
## Filtering data | ||
|
||
Filter out the data for Bangladesh and Sweden, in years 1945 (when WWII ended) and 2010. | ||
Name the new data frame BS_1945_2010. | ||
How has income, birth per woman and child mortality rate changed during this 55-year period? | ||
|
||
```{r} | ||
BS_1945_2010 <- birth_and_mortality_and_income %>% | ||
filter(country == "Bangladesh" | | ||
country == "Sweden") %>% | ||
filter(year == 1945 | | ||
year == 2010) | ||
head(BS_1945_2010) | ||
``` | ||
|
||
|
||
## Mutate data | ||
|
||
Let's say for countries with income between 1000 to 10,000 dollars per year, they are called "fed". | ||
For countries with income above 10,000 dollars per year, they are called "wealthy". | ||
Below 1000, they are called "poor". | ||
|
||
Using this info to make a new column called "status". | ||
Hint: you will have to use case_when() and the "&" logic somewhere in this chunk. | ||
|
||
```{r} | ||
birth_and_mortality_and_income <- birth_and_mortality_and_income %>% | ||
mutate(status = case_when( | ||
income >= 1000 & income <= 10000 ~ "fed", | ||
income > 10000 ~ "wealthy", | ||
income < 1000 ~ "poor" | ||
)) | ||
head(birth_and_mortality_and_income) | ||
``` | ||
|
||
## Summarise the data | ||
|
||
Let's look at the average child mortality and its sd in year 2010. | ||
across countries across different status that we just defined. | ||
Name the new data frame "child_mortality_summmary_2010". | ||
|
||
```{r} | ||
child_mortality_summary_2010 <- birth_and_mortality_and_income %>% | ||
filter(year == 2010) %>% | ||
group_by(status) %>% | ||
summarize( | ||
avg = mean(death_per_1000_born), | ||
sd = sd(death_per_1000_born)) | ||
head(child_mortality_summary_2010) | ||
``` | ||
|
||
How does child mortality compare across income group in year 2010? | ||
Child mortality is higher for lower income groups. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
title: "Intro_to_data_vis ANSWER KEY" | ||
author: "Chenxin Li" | ||
date: "2023-01-06" | ||
output: | ||
html_notebook: | ||
number_sections: yes | ||
toc: yes | ||
toc_float: yes | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
|
||
# Required packages | ||
```{r} | ||
library(tidyverse) | ||
library(RColorBrewer) | ||
``` | ||
|
||
# Data from lecture | ||
```{r} | ||
child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols()) | ||
babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols()) | ||
income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols()) | ||
``` | ||
|
||
Re-shape data into tidy format. | ||
```{r} | ||
babies_per_woman_tidy <- babies_per_woman %>% | ||
pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302)) | ||
child_mortality_tidy <- child_mortality %>% | ||
pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302)) | ||
income_tidy <- income %>% | ||
pivot_longer(names_to = "year", values_to = "income", cols = c(2:242)) | ||
``` | ||
|
||
Join them together. | ||
```{r} | ||
example2_data <- babies_per_woman_tidy %>% | ||
inner_join(child_mortality_tidy, by = c("country", "year")) %>% | ||
inner_join(income_tidy, by = c("country", "year")) | ||
head(example2_data) | ||
``` | ||
|
||
# Exercise | ||
Graph income (in log10 scale) on x axis, child mortality on y axis, and color with children/woman in year 2010. | ||
Were the trend similar to year 1945? | ||
Save the graph using `ggsave()`. | ||
|
||
|
||
```{r} | ||
example2_data %>% | ||
filter(year == 2010) %>% | ||
ggplot(aes(x = log10(income), y = death_per_1000_born)) + | ||
geom_point(aes(color = birth)) + | ||
scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) + | ||
labs(x = "log10 income", | ||
y = "death per 1000 born", | ||
title = "2010") + | ||
theme_classic() | ||
ggsave("Lesson3_answer1.png", width = 3, height = 3) | ||
``` | ||
```{r} | ||
example2_data %>% | ||
filter(year == 1945) %>% | ||
ggplot(aes(x = log10(income), y = death_per_1000_born)) + | ||
geom_point(aes(color = birth)) + | ||
scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) + | ||
labs(x = "log10 income", | ||
y = "death per 1000 born", | ||
title = "1945") + | ||
theme_classic() | ||
ggsave("Lesson3_answer2.png", width = 3, height = 3) | ||
``` | ||
|
Oops, something went wrong.