started porting next worksheet

wilkelab · Dec 26, 2024 · b6afb45 · b6afb45
1 parent 6ecc8f4
commit b6afb45
Show file tree

Hide file tree

Showing 18 changed files with 28,725 additions and 0 deletions.
diff --git a/worksheets/coordinate-systems-axes.html b/worksheets/coordinate-systems-axes.html
diff --git a/worksheets/coordinate-systems-axes.qmd b/worksheets/coordinate-systems-axes.qmd
@@ -0,0 +1,269 @@
+---
+title: "Coordinate systems and axes"
+author: "Claus O. Wilke"
+format: live-html
+engine: knitr
+webr:
+  render-df: gt-interactive
+---
+
+{{< include ./_extensions/r-wasm/live/_knitr.qmd >}}
+
+## Introduction
+
+In this worksheet, we will discuss how to change and customize scales and coordinate systems.
+
+First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.
+
+```{webr}
+#| warning: false
+#| edit: false
+library(tidyverse)
+library(palmerpenguins)
+```
+
+Next we set up the data.
+
+```{webr}
+#| edit: false
+#| warning: false
+boxoffice <- tibble(
+  rank = 1:5,
+  title = c("Star Wars", "Jumanji", "Pitch Perfect 3", "Greatest Showman", "Ferdinand"),
+  amount = c(71.57, 36.17, 19.93, 8.81, 7.32) # million USD
+)
+
+temperatures <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") |>
+  mutate(
+    location = factor(
+      location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
+    )
+  ) |>
+  select(location, day_of_year, month, temperature)
+
+temps_wide <- temperatures |>
+  pivot_wider(names_from = location, values_from = temperature)
+
+US_census <- read_csv("https://wilkelab.org/SDS375/datasets/US_census.csv")
+tx_counties <- US_census |> 
+  filter(state == "Texas") |>
+  select(name, pop2010) |>
+  extract(name, "county", regex = "(.+) County") |>
+  mutate(popratio = pop2010/median(pop2010)) |>
+  arrange(desc(popratio)) |>
+  mutate(index = 1:n())
+```
+
+We will be working with three different datasets, `boxoffice`, `temperatures`, and `tx_counties`. You have already seen the first two previously.
+
+The `boxoffice` dataset contains box-office gross results for Dec. 22-24, 2017.
+```{webr}
+#| edit: false
+boxoffice
+```
+
+The `temperatures` dataset contains the average temperature for each day of the year for four different locations.
+```{webr}
+#| edit: false
+temperatures
+```
+
+The `tx_counties` dataset holds information about how many people lived in Texas counties in 2010. The column `popratio` is the ratio of the number of inhabitants to the median across all counties, and the column `index` simply counts the counties from most populous to least populous.
+```{webr}
+#| edit: false
+tx_counties
+```
+
+## Scale customizations
+
+We can modify the appearance of the x and y axis with scale functions. All scale functions have name of the form `scale_`*`aesthetic`*`_`*`type`*`()`, where *`aesthetic`* stands for an aesthetic to which we're mapping data (e.g., `x`, `y`, `color`, `fill`, etc), and *`type`* stands for the specific type of the scale. What scale types are available depends on both the aesthetic and the data.
+
+Here, we only consider position scales, which are scales for the `x` and `y` aesthetics. The most commonly used scales types for position scales are `continuous` for continuous data and `discrete` for discrete data, yielding the scale functions `scale_x_continuous()`, `scale_y_continuous()`, `scale_x_discrete()`, and `scale_y_discrete()`. But there are others, such as `date`, `time`, or `binned`. You can look them up here: [https://ggplot2.tidyverse.org/reference/index.html#section-scales](https://ggplot2.tidyverse.org/reference/index.html#section-scales)
+
+Position scale functions are used to modify both the appearance of the axis (axis title, axis labels, number and location of breaks, etc.) and the mapping from data to position (including the range of data values considered, i.e., axis limits, and whether the data should be transformed, as is the case in log scales).
+
+Let's start with this plot of the `boxoffice` data:
+
+```{webr}
+#| edit: false
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col()
+```
+
+We can use scale functions to modify the axis titles, by setting the `name` argument. For example, `scale_x_continuous(name = "the x value")` would set the axis title to "the x value" in a continuous scale along the x axis.
+
+Use the appropriate scale functions to modify both axis titles in the above plot. Think about which axes (if any) are continuous and which are discrete.
+
+```{webr} 
+#| exercise: boxoffice-axis-title
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x____() +
+  scale_y____()
+```
+
+::: { .hint exercise="boxoffice-axis-title" }
+::: { .callout-tip title="Hint" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(___) +
+  scale_y_discrete(___)
+```
+:::
+:::
+
+::: { .solution exercise="boxoffice-axis-title" }
+::: { .callout-tip title="Solution" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(name = "weekend gross (million USD)") +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+We can also use scale functions to set axis limits, via the `limits` argument. For continuous scales, the `limits` argument takes a vector of two numbers representing the lower and upper limit. For example, `limits = c(0, 80)` would indicate an axis that runs from 0 to 80. For discrete scales, the limits argument takes a vector of all the categories that should be shown, in the order in which they should be shown.
+
+Try this out by setting a limit from 0 to 80 on the x axis.
+
+```{webr} 
+#| exercise: boxoffice-xlims
+
+```
+
+::: { .hint exercise="boxoffice-xlims" }
+::: { .callout-tip title="Hint" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(
+    name = "weekend gross (million USD)",
+    limits = ___
+  ) +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+::: { .solution exercise="boxoffice-xlims" }
+::: { .callout-tip title="Solution" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(
+    name = "weekend gross (million USD)",
+    limits = c(0, 80)
+  ) +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+What happens if you set the axis limits such that not all data points can be shown, for example an upper limit of 65 rather than 80? Do you understand why?
+
+(Hint: Scale limits are applied before the plot is drawn, and data points outside the scale limits are discarded. If this is not what you want, there's an alternative way of setting limits. See the very end of this worksheet under "Coords".)
+
+Next, we can use the `breaks` and `labels` arguments to customize which axis ticks are shown and how they are labeled. In general, you need exactly as many breaks as labels. If you define only breaks but not labels then labels are automatically generated from the breaks.
+
+Building on the code from the previous example, set breaks at 0, 25, 50, and 75, and format the labels such that they can be read as currency. For example, write $25M instead of just 25.
+
+```{webr} 
+#| exercise: boxoffice-breaks
+
+```
+
+::: { .hint exercise="boxoffice-breaks" }
+::: { .callout-tip title="Hint" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(
+    name = "weekend gross",
+    limits = c(0, 80),
+    breaks = ___,
+    labels = ___
+  ) +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+::: { .solution exercise="boxoffice-breaks" }
+::: { .callout-tip title="Solution" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(
+    name = "weekend gross",
+    limits = c(0, 80),
+    breaks = c(0, 25, 50, 75),
+    labels = c("0", "$25M", "$50M", "$75M")
+  ) +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+When looking at the resulting plot, you may notice that the x axis extends beyond the limits you have set. This happens because by default ggplot scales expand the axis range by a small amount. You can set the axis expansion via the `expand` parameter. Setting the expansion can be a bit tricky, because we can set expansion at either end of a scale and we can define both additive and multiplicative expansion. (Additive expansion adds a fixed value, whereas multiplicative expansion adds a multiple of the scale range. ggplot uses additive expansion for discrete scales and multiplicative expansion for continuous scales, but you can use either for either scale.)
+
+The simplest way to define expansions is with the `expansion()` function, which takes arguments `mult` for multiplicative expansion and `add` for additive expansion. Either takes a vector of two values, indicating expansion at the lower and upper end, respectively. Thus, `expansion(mult = c(0, 0.1))` indicates multiplicative expansion of 0% at the lower end and 10% at the upper end, whereas `expansion(add = c(2, 2))` indicates additive expansion of 2 units at either end of the scale.
+
+Try this yourself. Use the `expand` argument to remove the gap to the left of 0 on the x axis.
+
+```{webr} 
+#| exercise: boxoffice-expansion
+
+```
+
+::: { .hint exercise="boxoffice-expansion" }
+::: { .callout-tip title="Hint" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(
+    name = "weekend gross",
+    limits = c(0, 80),
+    breaks = c(0, 25, 50, 75),
+    labels = c("0", "$25M", "$50M", "$75M"),
+    expand = expansion(___)
+  ) +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+::: { .solution exercise="boxoffice-expansion" }
+::: { .callout-tip title="Solution" collapse="false"}
+```r
+ggplot(boxoffice) +
+  aes(amount, fct_reorder(title, amount)) +
+  geom_col() +
+  scale_x_continuous(
+    name = "weekend gross",
+    limits = c(0, 80),
+    breaks = c(0, 25, 50, 75),
+    labels = c("0", "$25M", "$50M", "$75M"),
+    expand = expansion(mult = c(0, 0.06))
+  ) +
+  scale_y_discrete(name = NULL)
+```
+:::
+:::
+
+Try different settings for the `expand` argument. Try both multiplicative and additive expansions. Apply different expansions to the y axis as well.
+
+
+## Logarithmic scales
+
+Scales can also transform the data before plotting. For example, log scales such as `scale_x_log10()` and `scale_y_log10()` log-transform the data. To try this out, we'll be working with the `tx_counties` dataset: