-
Notifications
You must be signed in to change notification settings - Fork 10
/
syllabus.Rmd
142 lines (87 loc) · 4.79 KB
/
syllabus.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: "Syllabus"
description: ""
output:
distill::distill_article:
toc: true
toc_depth: 2
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
## Course title and instructor
**Title:** DSC 385 Data Exploration, Visualization, and Foundations of Unsupervised Learning
**Instructor:** Claus O. Wilke<br>
**GitHub:** [clauswilke](https://github.com/clauswilke) <br>
## Purpose and contents of the class
In this class, students will learn how to visualize data sets and how to reason about and communicate with data visualizations. Students will also learn how to assess data quality and provenance, how to compile analyses and visualizations into reports, and how to make the reports reproducible. A substantial component of this class will be dedicated to learning how to program in R.
What you will learn:
- Data visualization
- R programming
- Reproducibility
- Data quality and relevance
- Data ethics and provenance
- Dimension reduction
- Clustering
## Prerequisites
Students are expected to have basic knowledge of statistics. Prior experience with the programming language R is beneficial but not strictly required.
## Textbook
This class draws heavily from materials presented in the following book:
- Claus O. Wilke. [Fundamentals of Data Visualization.](https://clauswilke.com/dataviz) O'Reilly Media, 2019.
Additionally, we will also make use of the following books:
- Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen. [ggplot2: Elegant Graphics for Data Analysis, 3rd ed.](https://ggplot2-book.org/) Springer, to appear.
- Kieran Healy. [Data Visualization: A Practical Introduction.](https://socviz.co/) Princeton University Press, 2018.
All these books are freely available online and you do not need to purchase a physical copy of either book to succeed in this class.
## Topics covered
::: l-page
------------------------------------------------------------------
Class Topic Coding concepts covered
------- --------------------------- ------------------------------
1. Introduction, reproducible RStudio setup online, R Markdown
workflows
2. Aesthetic mappings **ggplot2** quickstart
3. Telling a story
4. Visualizing amounts `geom_col()`, `geom_point()`,
position adjustments
5. Coordinate systems and coords and position scales
axes
6. Visualizing distributions stats, `geom_density()`,
1 `geom_histogram()`
7. Visualizing distributions violin plots, sina plots, ridgeline plots
2
8. Color scales color and fill scales
9. Data wrangling 1 `mutate()`, `filter()`, `arrange()`
10. Data wrangling 2 `group_by()`, `summarize()`, `count()`
11. Visualizing proportions bar charts, pie charts
12. Getting to know your data
1: Data provenance
13. Getting to know your data handling missing data, `is.na()`, `case_when()`
2: Data quality and
relevance
14. Getting things into the `fct_reorder()`, `fct_lump()`
right order
15. Figure design ggplot themes
16. Color spaces, color vision **colorspace** package
deficiency
17. Functions and functional `map()`, `nest()`, **purrr** package
programming
18. Visualizing trends `geom_smooth()`
19. Working with models `lm`, `cor.test`, **broom** package
20. Visualizing uncertainty frequency framing, error bars, **ggdist** package
21. Dimension reduction 1 PCA
22. Dimension reduction 2 kernel PCA, t-SNE, UMAP
23. Clustering 1 k-means clustering
24. Clustering 2 hierarchical clustering
25. Data ethics
26. Visualizing geospatial `geom_sf()`, `coord_sf()`
data
27. Redundant coding, text **ggrepel** package
annotations
28. Interactive plots **ggiraph** package
29. Over-plotting jittering, 2d histograms,
contour plots
30. Compound figures **patchwork** package
----------------------------------------------------------------
:::
## Reuse {.appendix}
Text and figures are licensed under Creative Commons Attribution [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Any computer code (R, HTML, CSS, etc.) in slides and worksheets, including in slide and worksheet sources, is also licensed under [MIT](https://github.com/wilkelab/SDS375/LICENSE.md). Note that figures in slides may be pulled in from external sources and may be licensed under different terms. For such images, image credits are available in the slide notes, accessible via pressing the letter 'p'.