-
Notifications
You must be signed in to change notification settings - Fork 13
/
presentation_project.Rmd
179 lines (107 loc) · 5.75 KB
/
presentation_project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# Project
Either
- have some idea about new functions you want to create and share, and make it as a new R package,
- explore, visualize, and analyze some data (cf. hereinafter).
## What is Tidy Tuesday?
```{r, fig.align="center", out.width="60%", echo=FALSE}
knitr::include_graphics("images/tt_logo.jpg")
```
- A weekly data project from [the R4DS community](https://www.rfordatasci.com/).
- The aim is to understand how to summarize and arrange data to make meaningful charts with `ggplot2`, `tidyr`, `dplyr`, and other tools in the `tidyverse` ecosystem.
- Safe and supportive space for individuals to practice their **wrangling and data visualization** skills independent of drawing conclusions.
How it works:
1. The dataset comes from a *source article* and it is made available on Mondays.
2. People then play with the data set using the tidyverse, explore things that they think are interesting, and/or try to recreate the source article plots/results.
3. And share their own version on Twitter (with [the hashtag #TidyTuesday](https://twitter.com/search?q=%23tidytuesday&src=typed_query)).
## Tidy Thursday
Our own mini-version of Tidy Tuesday!
1. Join in teams of 2 (or 3).
1. Choose a dataset from the ones provided.
1. Come up with an interesting topic to visualize (or recreate an existing one).
1. Apply things you have learned in this course, and try to learn new things with our help and Google's help.
1. Share the resulting plot(s) and the code with the rest on Friday with a short presentation talking about challenges and conclusions from the plot.
### Choose a dataset
Install the {tidytuesdayR} package:
```{r eval=FALSE, tidy=FALSE}
install.packages("tidytuesdayR")
```
Some interesting datasets:
- [Olympic medals](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md)
```{r eval=FALSE, tidy=FALSE}
data <- tidytuesdayR::tt_load('2021-07-27')[["olympics"]]
```
- [Netflix titles](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-04-20/readme.md)
```{r eval=FALSE, tidy=FALSE}
data <- tidytuesdayR::tt_load('2021-04-20')[["netflix_titles"]]
```
- [Spotify songs](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-01-21/readme.md)
```{r eval=FALSE, tidy=FALSE}
data <- tidytuesdayR::tt_load('2020-01-21')[["spotify_songs"]]
```
If you do not like any of these, you can also choose from all the previous tidy Tuesday datasets [here](https://github.com/rfordatascience/tidytuesday#datasets).
### Visualize the data
Come up with a question, or an interesting thing to learn from the data.
You can get inspired by looking at other participants on Twitter:
- Shiny app with a compilation of tweets ordered by likes [TidyTuesdayRocks](https://nsgrantham.shinyapps.io/tidytuesdayrocks/)
- Search for #TidyTuesday + dataset_name on Twitter.
Remember that the code is always available at the author's github page.
### Compilation of tweets
```{r, echo=FALSE}
knitr::include_app("https://nsgrantham.shinyapps.io/tidytuesdayrocks/")
```
### Example
Someone tried the recreate a plot using the **Netflix dataset**:
```{r, echo=FALSE}
tweetrmd::tweet_embed("https://twitter.com/marieke_k_jones/status/1384560097821614081")
```
<br>
Clara's (quick) version:
```{r message=FALSE, warning=FALSE}
library(tidyverse)
netflix_titles <- tidytuesdayR::tt_load('2021-04-20')[["netflix_titles"]]
```
```{r, message=FALSE, warning=FALSE, fig.align='center', fig.width=10}
netflix_titles %>%
filter(type == "Movie") %>%
mutate(duration = as.numeric(str_extract(duration, "(\\d)+")),
cat = map_chr(listed_in, ~str_split(.x, ", ")[[1]][1]),
cat = case_when(
cat %in% c("Horror Movies", "Thrillers") ~ "Horror Movies & Thrillers",
TRUE ~ cat)) %>%
group_by(release_year, cat) %>%
summarise(mean_duration = mean(duration)) %>%
filter(cat %in% c("Action & Adventure", "Children & Family Movies",
"Comedies", "Documentaries", "Dramas",
"Horror Movies & Thrillers")) %>%
ggplot(aes(x = release_year, y = mean_duration, color = cat)) +
geom_line() +
scale_x_continuous(limits = c(1980, 2020)) +
scale_y_continuous(limits = c(50, 160), breaks = seq(50,150, 25)) +
facet_wrap(~cat) +
theme_minimal() +
theme(text = element_text(size = 14),
legend.position = "none",
axis.title.x = element_blank()) +
labs(title = "Children's movies and Dramas decrease in duration over time",
y = "Average movie duration (min)")
```
### Presentation
Make a short (5-10 min) presentation; you can simply make an HTML from RMarkdown, push to GitHub and we can preview it on <http://htmlpreview.github.io/>.
What you can present:
- Introduce the data set and the variables you have chosen to visualize.
- Show the resulting plot and discuss the trends/patterns in the data.
- If you chose to recreate a visualization from Twitter, comment on the quality of the plot or potential problems.
- Show the generative code and comment on the steps and transformations you have applied to the data.
- Did you find a set of `tidyverse` functions particularly useful? Did you discover another useful package?
Some examples from students in 2022-2023:
- [alcohol consumption](presentations/alcohol-consumption.html)
- [lego sets](presentations/lego.html)
- [F1 winners](presentations/f1.html)
- [baby names](presentations/baby-names.html)
- [UFO sightings](presentations/UFO_Sightings.html)
- [tornados](presentations/tornados.html)
- [wealth inequalities in the USA](presentations/wealth-inequalities.html)
- [Eurovision](presentations/TidyTuesday_Eurovision.html)
- [horror movies](presentations/Horror-movies.docx)
- [a package for clinical data](presentations/package-stRoke.html)
- [a package to read BAM files](presentations/package_banalyzer.html)