-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathclass2.qmd
146 lines (108 loc) · 6.42 KB
/
class2.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: "Applying Coding Basics"
subtitle: "Coding Basics, Day 2"
author: "Matthew Sutcliffe, Madeline Gillman, JP Flores"
format:
html:
toc: true
---
## Objectives of Coding Basics: Class 2
- Be able to apply the objectives covered in Coding Basics: Class 1 to a new dataset
- Identify and fix a bug in a code example
## Your datasets
This class we will be working with the `mtcars` dataset. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973--74 models).
The other dataset we will be working with is the Palmer Penguins dataset. This is not a built-in dataset, so you will need to install it. You will only need to install the package once.
```{r}
# this code is making sure that the correct files are installed during the project rendering
# Students, don't worry too much about this code. It is here to make sure that our curriculum
# book runs correcrtly, but if you are curious, feel free to ask teachers for more info.
if(!require("palmerpenguins")){
install.packages("palmerpenguins",repos = 'http://cran.us.r-project.org')
}
```
```{r}
#| eval: false
install.packages("palmerpenguins")
```
Once it is installed, you will need to load the package into your R environment. You will need to do this anytime you want to use a package.
```{r}
library(palmerpenguins)
```
You will also need to load the penguins dataset into your R environment:
```{r}
data(package = "palmerpenguins")
```
## Today's class
### Cars dataset
1. The dataset is stored 'under the hood' in an object called `mtcars`. View the dataset. Use `head()` to view the first 5, 10, and 20 rows.
2. Assign `mtcars` to a new variable of your choice.
3. What is the data type of each column in the dataset?
4. How many rows are in the dataset? How many columns? You may need to look up how to do this! Try searching "how to get number of rows in data frame in R" in Google.
5. Run `str(mtcars)` . What is this output telling you? How does it compare to what you found in #3 and #4?
6. For each column, find the mean, range, and median values. Are you able to do this for all columns? Why or why not?
7. What value is in the 6th row and 10th column?
8. Print every row of the 4th column.
9. Print every column of only rows 28 to 31.
### Penguins dataset
1. The dataset is stored 'under the hood' in an object called `penguins`. View the dataset. Use `head()` to view the first 5, 10, and 20 rows.
2. Assign `penguins` to a new variable of your choice.
3. What is the data type of each column in the dataset?
4. How many rows and columns?
5. For each column, if possible, find the mean, range, and median values.
6. For columns that you cannot find the mean/range/median of, try using the `table()` function, e.g. `table(penguis$species)` . What is this telling you?
7. Currently, the `bill_length` and `bill_depth` columns are in millimeters. Create a new column with those values converted to centimeters. (HINT: look at what you did at the end of the "Accessing parts of a list" section in Class 1)
8. Add two new columns to the data frame of your choice.
9. The penguins dataset is not perfect--it has some missing values. Check the missing values in the column sex by running two functions: `is.na(penguins$sex)` and `sum(is.na(penguins$sex))` .
a. What is the difference between the two outputs?
b. Compare to the result in #6.
c. Use the help page for the `table()` function and see if you can get the output to include NAs.
### Code debugging
Your former lab mate Weird Barbie graduated a few years ago. Before she left, she was working on some interesting analyses of the frequencies of Kens.
![photo credit: Warner Bros.](data/class2_files/weirdbarbie.jpeg){fig-align="center"}
Here's the data below, which you will not (and should not) need to change:
```{r}
#| eval: false
# The data -- DO NOT EDIT
ken_data <- data.frame(
"ken_name" = c("Ken1", "Ken2", "Ken3", "Ken4", "Ken5", "Ken6", "Ken7", "Allan"),
"hair_color" = c("Blonde", "Brown", "Black", "Red", "Blonde", "Brown", "Black", "Black"),
"cowboy_hats_owned" = c(2, 0, 1, 3, 0, 1, 2, 0),
"favorite_outfit" = c("Casual", "Formal", "Sporty", "Beachwear", "Formal", "Casual", "Sporty", "Casual"),
"age" = c(25, 27, 26, 28, 29, 30, 26, 27),
"height_cm" = c(180, 175, 182, 178, 180, 183, 177, 175),
"weight_kg" = c(75, 70, 80, 77, 76, 78, 79, 70),
"favorite_hobby" = c("Surfing", "Reading", "Soccer", "Volleyball", "Painting", "Cooking", "Dancing", "Guitar"),
"favorite_color" = c("Blue", "Green", "Red", "Yellow", "Purple", "Orange", "Pink", "Blue"),
"shoe_size" = c(10, 9, 11, 10, NA, 11, 10, 9),
"best_friend" = c("Barbie", "Barbie", "Barbie", "Barbie", "Barbie", "Barbie", "Barbie", NA),
"is_ken" = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)
)
```
However, as is typical of Weird Barbie, her code is...weird. In almost all other aspects of life, that's OK! But when it comes to you, three years later, trying to figure out what she did...not ideal. Here's her code below. As it's written, there are many bugs (code errors that either return an error or return an unexpected/incorrect result), the style is inconsistent, and there is no documentation. Using what you have learned so far, fix Weird Barbie's code: find the bugs, smash the bugs (get the code to run), change to a consistent style, and add helpful comments. You may need to consult the style guide mentioned in Class 0, help pages, and Google.
```{r}
#| eval: false
str(ken_data)
haed(ken_data)
mean(ken_data$cowboy_hats_owned)
hist(ken_data$cowboy_hats_owned)
ken_data$more.than.1_cowboyHat <- ken_data$cowboy_hats_owned > 1
print(paste(sum(ken_data$more.than.1_cowboyHat), "Kens have more than 1 cowboy hat"))
range(ken$age)
range(ken_data$shoe_size)
correlation <- cor(ken_data$height_cm, ken_data$weight_kg)
print(paste("The correlation between height and weight is", correlation))
plot(ken_data$height_cm, ken_data$weight_kg)
table(ken_data$best_friend)
# looks like everyone's bff is barbie!
# outfits
table(ken_data$favorte_outfit)
# no allan
range(ken_data[1:7,5])
noAllan <- ken_data[1:7,]
also_noAllan <- noAllan <- ken_data[ken_data$is_ken == TRUE,]
range(noAllan$shoe_size)
# Are the sporty Kens taller than the other Kens?
sporty_kens <- mean( ken_data [ken_data$favorite_outfit == "Sporty", "height_cm"])
other_kens_mean <- mean(ken_data[ken_data$favorite_outfit != "Sporty", "height_cm"] )
sporty_kens > other_kens_mean
```