Skip to content

Commit

Permalink
next worksheet
Browse files Browse the repository at this point in the history
  • Loading branch information
clauswilke committed Dec 26, 2024
1 parent a41f1a6 commit 6ecc8f4
Show file tree
Hide file tree
Showing 44 changed files with 57,610 additions and 40 deletions.
2 changes: 2 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ website:
menu:
- text: "Class 2: Aesthetic mappings"
href: worksheets/aesthetic-mappings.html
- text: "Class 3: Visualizing amounts"
href: worksheets/visualizing-amounts.html

format:
html:
Expand Down
4 changes: 4 additions & 0 deletions _site/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,10 @@
<li>
<a class="dropdown-item" href="./worksheets/aesthetic-mappings.html">
<span class="dropdown-text">Class 2: Aesthetic mappings</span></a>
</li>
<li>
<a class="dropdown-item" href="./worksheets/visualizing-amounts.html">
<span class="dropdown-text">Class 3: Visualizing amounts</span></a>
</li>
</ul>
</li>
Expand Down
4 changes: 4 additions & 0 deletions _site/schedule.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,10 @@
<li>
<a class="dropdown-item" href="./worksheets/aesthetic-mappings.html">
<span class="dropdown-text">Class 2: Aesthetic mappings</span></a>
</li>
<li>
<a class="dropdown-item" href="./worksheets/visualizing-amounts.html">
<span class="dropdown-text">Class 3: Visualizing amounts</span></a>
</li>
</ul>
</li>
Expand Down
4 changes: 4 additions & 0 deletions _site/syllabus.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,10 @@
<li>
<a class="dropdown-item" href="./worksheets/aesthetic-mappings.html">
<span class="dropdown-text">Class 2: Aesthetic mappings</span></a>
</li>
<li>
<a class="dropdown-item" href="./worksheets/visualizing-amounts.html">
<span class="dropdown-text">Class 3: Visualizing amounts</span></a>
</li>
</ul>
</li>
Expand Down
34 changes: 17 additions & 17 deletions _site/worksheets/aesthetic-mappings.html

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions _site/worksheets/aesthetic-mappings.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: "Aesthetic mappings"
author: "Claus O. Wilke"
format: live-html
engine: knitr
webr:
render-df: gt-interactive
---

{{< include ./_extensions/r-wasm/live/_knitr.qmd >}}
Expand All @@ -17,7 +19,6 @@ First we need to load the required R packages. Please wait a moment until the li
#| warning: false
#| edit: false
library(tidyverse)
library(DT)
```

Next we set up the data.
Expand All @@ -40,7 +41,7 @@ We will first work with the dataset `temps_houston` which contains the average t
```{webr}
#| edit: false
#| autorun: true
datatable(temps_houston)
temps_houston
```

## Basic use of ggplot
Expand Down Expand Up @@ -161,7 +162,7 @@ Next we will be working with the dataset `temperatures`, which is similar to `te
```{webr}
#| edit: false
#| autorun: true
datatable(temperatures)
temperatures
```

Make a line plot of `temperature` against `day_of_year`, using the `color` aesthetic to color the lines by location.
Expand Down
959 changes: 959 additions & 0 deletions _site/worksheets/visualizing-amounts.html

Large diffs are not rendered by default.

265 changes: 265 additions & 0 deletions _site/worksheets/visualizing-amounts.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
---
title: "Visualizing amounts"
author: "Claus O. Wilke"
format: live-html
engine: knitr
webr:
render-df: gt-interactive
---

{{< include ./_extensions/r-wasm/live/_knitr.qmd >}}

## Introduction

In this worksheet, we will discuss a core concept of ggplot, the mapping of data values onto aesthetics.

First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.

```{webr}
#| warning: false
#| edit: false
library(tidyverse)
library(palmerpenguins)
```

Next we set up the data.

```{webr}
#| edit: false
#| warning: false
# Data from Box Office Mojo for Dec. 22-24, 2017.
boxoffice <- tibble(
rank = 1:5,
title = c("Star Wars", "Jumanji", "Pitch Perfect 3", "Greatest Showman", "Ferdinand"),
amount = c(71.57, 36.17, 19.93, 8.81, 7.32) # million USD
)
penguins2 <- na.omit(penguins) # remove all rows with any missing values
```

We will be working with two datasets. First, box-office gross results for Dec. 22-24, 2017:
```{webr}
#| edit: false
boxoffice
```

Second, data on individual penguins on Antarctica. Note that missing values have been removed:
```r
penguins2
```

::: {.column-page}
```{webr}
#| echo: false
penguins2
```
:::

## Drawing numerical values as bars

For the `boxoffice` dataset, we want to draw the amount (Weekend gross, in million USD) for each movie as a bar. Somewhat confusingly, the ggplot geom that does this is called `geom_col()`. (There is also a `geom_bar()`, but it works differently. We'll get to that later in this tutorial.) Make a bar plot of `amount` versus `title`. This means `amount` goes on the y axis and `title` on the x axis.

```{webr}
#| exercise: geom-col
ggplot(boxoffice, aes(x = ___, y = ___)) +
___()
```

::: { .hint exercise="geom-col" }
::: { .callout-tip title="Hint" collapse="false"}
```r
ggplot(boxoffice, aes(x = ___, y = ___)) +
geom_col()
```
:::
:::

::: { .solution exercise="geom-col" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(boxoffice, aes(x = title, y = amount)) +
geom_col()
```
:::
:::

Now flip which column you map onto x and which onto y.

```{webr}
#| exercise: geom-col2
ggplot(boxoffice, aes(x = ___, y = ___)) +
geom_col()
```


::: { .solution exercise="geom-col2" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(boxoffice, aes(x = amount, y = title)) +
geom_col()
```
:::
:::

The x-axis label should specify that the amount is in million USD, and the y axis doesn't need the word "title". Use `xlab()` and `ylab()` to make these changes to the plot.


```{webr}
#| exercise: geom-col3
ggplot(boxoffice, aes(x = amount, y = title)) +
geom_col() +
___() +
___()
```

::: { .hint exercise="geom-col3" }
::: { .callout-tip title="Hint" collapse="false"}
```r
ggplot(boxoffice, aes(x = amount, y = title)) +
geom_col() +
xlab(___) +
ylab(___)
```
:::
:::

::: { .solution exercise="geom-col3" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(boxoffice, aes(x = amount, y = title)) +
geom_col() +
xlab("weekend gross (million USD)") +
ylab(NULL) # NULL means nothing, don't show a y label
```
:::
:::

## Getting bars into the right order

Whenever we are making bar plots, we need to think about the correct order of the bars. By default, ggplot uses alphabetic ordering, but that is rarely appropriate. If there is no inherent ordering (such as, for example, a temporal progression), then it is usually best to order by the magnitude of the values, i.e., sort the bars by length.

We can do this with the `fct_reorder()` function, which takes two arguments: The categorical variable we want to re-order, and the values by which we want to order. Here, the categorical variable is the column `title` and the values are in the column `amount`. We can apply the `fct_reorder()` function right inside the `aes()` statement.

```{webr}
#| exercise: geom-col-sorted
ggplot(boxoffice, aes(x = amount, y = ___)) +
geom_col() +
xlab("weekend gross (million USD)") +
ylab(NULL)
```

::: { .hint exercise="geom-col-sorted" }
::: { .callout-tip title="Hint" collapse="false"}
```r
ggplot(boxoffice, aes(x = amount, y = fct_reorder(___, ___))) +
geom_col() +
xlab("weekend gross (million USD)") +
ylab(NULL)
```
:::
:::

::: { .solution exercise="geom-col-sorted" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(boxoffice, aes(x = amount, y = fct_reorder(title, amount))) +
geom_col() +
xlab("weekend gross (million USD)") +
ylab(NULL)
```
:::
:::

Try the following additional experiments in the above code:

- What happens when you run the above code without the `ylab(NULL)` statement?
- Can you make the bars blue?
- Can you color the bars by `amount` or by `title`?


## Drawing bars based on a count

The `boxoffice` dataset contains individual values, the dollar amounts, that we wanted to visualize with bars. Often, however, we encounter a slightly different scenario: A dataset doesn't contain the numeric amounts directly, but instead contains observations we want to count. This is the case in the `penguins2` dataset (see above).

It contains one row per penguin. If we want to make a bar plot of the number of penguins of each species (Adelie, Chinstrap, Gentoo), we cannot use `geom_col()` as before, because the dataset doesn't have a column that contains these counts.

The solution here is to use `geom_bar()`, which performs a count and then displays the result of that count. Because `geom_bar()` counts automatically, you only have to provide it with a single aesthetic, which specifies the data column in which you are counting.

Try this out. Make a bar plot of the number of penguins per species. Map the penguin species onto the x axis.

```{webr}
#| exercise: geom-bar
ggplot(penguins2, aes(___)) +
geom_bar()
```

::: { .hint exercise="geom-bar" }
::: { .callout-tip title="Hint" collapse="false"}
```r
ggplot(penguins2, aes(x = ___)) +
geom_bar()
```
:::
:::

::: { .solution exercise="geom-bar" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(penguins2, aes(x = species)) +
geom_bar()
```
:::
:::

Try the following additional modifications in the above code:

- Map penguin species onto the y axis.
- Remove the axis label that says "species".
- Change the order of the bars manually, using `fct_relevel()` (see slides).

## Counting subgroups

`geom_bar()` automatically counts how many cases there are in each unique combination of different categorical aesthetics. In the previous example, we had only one categorical aesthetic, `species`. But we can add a second one, for example `sex`. Then `geom_bar()` counts the number of cases in each unique combination of species and sex and draws separate bars for each. Try this out by mapping the `sex` column onto the `fill` aesthetic.

```{webr}
#| exercise: geom-bar2
ggplot(penguins2, aes(x = species, fill = ___)) +
geom_bar()
```

::: { .solution exercise="geom-bar2" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(penguins2, aes(x = species, fill = sex)) +
geom_bar()
```
:::
:::

By default, the bars for different `fill` values but identical `x` values will be drawn on top of one-another. But there are other possibilities, which are controled by the `position` argument to `geom_bar()`. For example, try to set the position to `"dodge"`.

```{webr}
#| exercise: geom-bar-position
ggplot(penguins2, aes(x = species, fill = ___)) +
geom_bar(___)
```

::: { .hint exercise="geom-bar-position" }
::: { .callout-tip title="Hint" collapse="false"}
```r
ggplot(penguins2, aes(x = species, fill = sex)) +
geom_bar(position = ___)
```
:::
:::

::: { .solution exercise="geom-bar-position" }
::: { .callout-tip title="Solution" collapse="false"}
```r
ggplot(penguins2, aes(x = species, fill = sex)) +
geom_bar(position = "dodge")
```
:::
:::

In the above code, also try positions `"stack"` and `"fill"`.
Loading

0 comments on commit 6ecc8f4

Please sign in to comment.