-
Notifications
You must be signed in to change notification settings - Fork 10
/
class2_instructor_guide.qmd
148 lines (106 loc) · 7.05 KB
/
class2_instructor_guide.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "Instructor Guide for Class 2: Coding Basics"
author: "Matthew Sutcliffe, Madeline Gillman, JP Flores"
format:
html:
toc: true
---
::: {.callout-note appearance="default"}
## Instructor Note
Today's class will be mainly students working independently or in groups. Teachers should not spend a ton of time presenting information or live-coding. The goal is give students practice consulting other code (mainly from class 1) and applying it to a new problem/dataset.
Start off today's class introducing the objectives and the datasets that will be used. Remind students of the help page and using the question mark to access help pages for functions/built-in datasets, then set them loose!
When students come to you with an issue, encourage them to try to look it up on their own first. If it's something really minor (e.g., wrong order for variable assignment like `4 -> my_favorite_number`) it's ok to give them the correction plus explanation. Either way, don't just correct the student's code without providing some sort of explanation.
The solutions to these questions are all very straightforward, so we don't have an answer key. There will be one for future classes though as the questions become more complex!
:::
## Objectives
- Be able to apply the objectives covered in Class 1 to a new dataset
- Identify and fix a bug in a code example
## Your datasets
This class we will be working with the `mtcars` dataset. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973--74 models).
The other dataset we will be working with is the Palmer Penguins dataset. This is not a built-in dataset, so you will need to install it. You will only need to install the package once.
```{r}
#| eval: false
install.packages("palmerpenguins")
```
Once it is installed, you will need to load the package into your R environment. You will need to do this anytime you want to use a package.
```{r}
library(palmerpenguins)
```
You will also need to load the penguins dataset into your R environment:
```{r}
data(package = "palmerpenguins")
```
## Today's class
### Cars dataset
1. The dataset is stored 'under the hood' in an object called `mtcars`. View the dataset. Use `head()` to view the first 5, 10, and 20 rows.
2. Assign `mtcars` to a new variable of your choice.
3. What is the data type of each column in the dataset?
4. How many rows are in the dataset? How many columns? You may need to look up how to do this! Try searching "how to get number of rows in data frame in R" in Google.
5. Run `str(mtcars)` . What is this output telling you? How does it compare to what you found in #3 and #4?
6. For each column, find the mean, range, and median values. Are you able to do this for all columns? Why or why not?
7. What value is in the 6th row and 10th column?
8. Print every row of the 4th column.
9. Print every column of only rows 28 to 31.
### Penguins dataset
1. The dataset is stored 'under the hood' in an object called `penguins`. View the dataset. Use `head()` to view the first 5, 10, and 20 rows.
2. Assign `penguins` to a new variable of your choice.
3. What is the data type of each column in the dataset?
4. How many rows and columns?
5. For each column, if possible, find the mean, range, and median values.
6. For columns that you cannot find the mean/range/median of, try using the `table()` function, e.g. `table(penguis$species)` . What is this telling you?
7. Currently, the `bill_length` and `bill_depth` columns are in millimeters. Create a new column with those values converted to centimeters. (HINT: look at what you did at the end of the "Accessing parts of a list" section in Class 1)
8. Add two new columns to the data frame of your choice.
9. The penguins dataset is not perfect--it has some missing values. Check the missing values in the column sex by running two functions: `is.na(penguins$sex)` and `sum(is.na(penguins$sex))` .
a. What is the difference between the two outputs?
b. Compare to the result in #6.
c. Use the help page for the `table()` function and see if you can get the output to include NAs.
### Code debugging
Your former lab mate Weird Barbie graduated a few years ago. Before she left, she was working on some interesting analyses of the frequencies of Kens.
![photo credit: Warner Bros.](data/class2_files/weirdbarbie.jpeg){fig-align="center"}
Here's the data below, which you will not (and should not) need to change:
```{r}
#| eval: false
# The data -- DO NOT EDIT
ken_data <- data.frame(
"ken_name" = c("Ken1", "Ken2", "Ken3", "Ken4", "Ken5", "Ken6", "Ken7", "Allan"),
"hair_color" = c("Blonde", "Brown", "Black", "Red", "Blonde", "Brown", "Black", "Black"),
"cowboy_hats_owned" = c(2, 0, 1, 3, 0, 1, 2, 0),
"favorite_outfit" = c("Casual", "Formal", "Sporty", "Beachwear", "Formal", "Casual", "Sporty", "Casual"),
"age" = c(25, 27, 26, 28, 29, 30, 26, 27),
"height_cm" = c(180, 175, 182, 178, 180, 183, 177, 175),
"weight_kg" = c(75, 70, 80, 77, 76, 78, 79, 70),
"favorite_hobby" = c("Surfing", "Reading", "Soccer", "Volleyball", "Painting", "Cooking", "Dancing", "Guitar"),
"favorite_color" = c("Blue", "Green", "Red", "Yellow", "Purple", "Orange", "Pink", "Blue"),
"shoe_size" = c(10, 9, 11, 10, NA, 11, 10, 9),
"best_friend" = c("Barbie", "Barbie", "Barbie", "Barbie", "Barbie", "Barbie", "Barbie", NA),
"is_ken" = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)
)
```
However, as is typical of Weird Barbie, her code is...weird. In almost all other aspects of life, that's OK! But when it comes to you, three years later, trying to figure out what she did...not ideal. Here's her code below. As it's written, there are many bugs (code errors that either return an error or return an unexpected/incorrect result), the style is inconsistent, and there is no documentation. Using what you have learned so far, fix Weird Barbie's code: find the bugs, smash the bugs (get the code to run), change to a consistent style, and add helpful comments. You may need to consult the style guide mentioned in Class 0, help pages, and Google.
```{r}
#| eval: false
str(ken_data)
haed(ken_data)
mean(ken_data$cowboy_hats_owned)
hist(ken_data$cowboy_hats_owned)
ken_data$more.than.1_cowboyHat <- ken_data$cowboy_hats_owned > 1
print(paste(sum(ken_data$more.than.1_cowboyHat), "Kens have more than 1 cowboy hat"))
range(ken$age)
range(ken_data$shoe_size)
correlation <- cor(ken_data$height_cm, ken_data$weight_kg)
print(paste("The correlation between height and weight is", correlation))
plot(ken_data$height_cm, ken_data$weight_kg)
table(ken_data$best_friend)
# looks like everyone's bff is barbie!
# outfits
table(ken_data$favorte_outfit)
# no allan
range(ken_data[1:7,5])
noAllan <- ken_data[1:7,]
also_noAllan <- noAllan <- ken_data[ken_data$is_ken == TRUE,]
range(noAllan$shoe_size)
# Are the sporty Kens taller than the other Kens?
sporty_kens <- mean( ken_data [ken_data$favorite_outfit == "Sporty", "height_cm"])
other_kens_mean <- mean(ken_data[ken_data$favorite_outfit != "Sporty", "height_cm"] )
sporty_kens > other_kens_mean
```