-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmd
171 lines (124 loc) · 5.17 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
title: "RWTH Cookbook for R"
author: "André Calero Valdez"
date: "Last updated: `r format(Sys.Date())`"
output:
html_document:
toc: true
toc_depth: 4
toc_float:
collapsed: true
smooth_scroll: true
number_sections: true
theme: cosmo
dev: png
df_print: paged
#code_folding: show
fig.asp : 0.681
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# requires the rwththeme package
if(!require(rwththeme)){
library(devtools)
install_github("sumidu/rwththeme")
}
```
# What is this cookbook?
This cookbook serves as a place to exchange frequently needed plot recipes using standardized data sets. In order to understand how the recipes work it is helpful to first make yourself familiar with the data sets used. Besides that we rely on a set of conventions to keep the code readable in this document.
This document is a work in progress document and might contain mistakes.
## Code conventions
The typical type of data the we are confronted with comes in the so called *wide format*. This means for each measurement we have a column. Each row is one observation (or in our case one participant).
We use the `tidyverse` package for data wrangling. If data wrangling is needed for visualization it should be included in the example.
1. Include data wrangling, if needed for visualization.
2. Select the variables needed in the ggplot call.
3. Format the code using the format code-tool in RStudio. Then readjust the parameter assignments in functions to one additional tab.
4. End all ggplot pipes in a `+ NULL` line. This allows commenting out single lines (including the last line) without breaking the code.
## Required Libraries
This cookbook uses several packages. The `rwththeme` package requires an install from github. You can go to https://github.com/Sumidu/rwththeme to install it.
```{r libraries, message=FALSE, warning=FALSE}
library(tidyverse)
library(rwththeme)
df <- read_rds("data.rds")
```
```{r child="010_data_familiarization.Rmd"}
```
```{r child = 'data_manipulation.Rmd'}
```
```{r child="020_basic_statistics.Rmd"}
```
# Basic Plotting Recipes
## Data Overview
### Creating a basic plot. Example: Histogram
Each plot has labels. The `title` should contain a short sentence summarizing what the plot is showing. The `subtitle` should contain what is actually shown (e.g., plot type, variables, etc.). The `caption` should give additional information that might make the plot ambiguous to read. Additionally sources can be added here.
The bins of a histogram become more readable, when the border color is set to `"white"` as in this example, as it delineates the plot from background.
```{r histogram-1}
df %>%
dplyr::select(age) %>%
ggplot() +
aes(age) +
geom_histogram(bins = 30, color = "white") +
labs(
title = "The sample is of a very young age",
subtitle = "Histogram of the age variable",
x = "Age",
y = "Frequency (absolute)",
caption = "Histogram using 30 bins. Source: hcictools"
) +
NULL
```
### Creating a chart to show basic summary statistics (SE Version)
Sometimes to explore a whole set of variables it can be helpful to visualize the summary statistics.
```{r describe_plot}
df %>%
select(starts_with("robo")) %>% # pick all variables that start with robo
psych::describe() %>% # get summary statistics
as.data.frame() %>% # convert the result to a data.frame
rownames_to_column() %>% # convert the non-tidy rowname to a column
ggplot() +
aes(y = mean, x = rowname, ymin = mean - se, ymax = mean + se) +
geom_point() +
geom_errorbar(width = 0.5) +
scale_y_continuous( breaks = 1:6, limits = c(1,6)) +
coord_flip() +
labs(y = "Mean of the variable", x = "Variable", title = "Acceptance for robo_bed is highest",
subtitle = "Means of different scale items", caption = "Errorbar denotes standard error")+
NULL
```
### Creating a chart to show basic summary statistics (CI Version)
```{r describe_plot_ci}
df %>%
select(starts_with("robo")) %>% # pick all variables that start with robo
psych::describe() %>% # get summary statistics
as.data.frame() %>% # convert the result to a data.frame
rownames_to_column() %>% # convert the non-tidy rowname to a column
ggplot() +
aes(y = mean, x = rowname, ymin = mean - se * 1.97, ymax = mean + se * 1.97) +
geom_point() +
geom_errorbar(width = 0.5) +
scale_y_continuous( breaks = 1:6, limits = c(1,6)) +
coord_flip() +
labs(y = "Mean of the variable", x = "Variable", title = "Acceptance for robo_bed is highest",
subtitle = "Means of different scale items", caption = "Errorbar denotes 95% confidence interval")+
NULL
```
# Advanced Plotting recipes
## Radarplot
# Tipps
## Assigning a plot to a variable and showing it still
Simply put the assignment in parenthesis. This will also plot the output
```{r tip1}
(p <-
df %>%
ggplot() +
aes(x = age) +
geom_histogram(bins=30)
)
```
```{r dummy, eval=FALSE, include=FALSE}
# here we can test things
finalise_plot(plot_name = p,
source = "Source: ONS",
save_filepath = "filename_that_my_plot_should_be_saved_to-nc.png",
logo_image_path = "logo.png")
```