forked from 314a/rr-r-publication
-
Notifications
You must be signed in to change notification settings - Fork 0
/
publication.Rmd
149 lines (105 loc) · 9.19 KB
/
publication.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "About Writing Dynamic Documents with R"
author:
- Author Name^1^
- ^1^Department of Geography, University of Zurich, Winterthurerstrasse 190, Zurich
bibliography: bibliography.bib
output:
pdf_document:
keep_tex: yes
csl: taylor-and-francis-harvard-x.csl
abstract: "This is the abstract of the template document used to show how to write
publications in R with R Markdown and the help of some packages. Based on a concrete
use case, this document exemplifies some of the caveats that may occur when writing
such a document and publishing it online on a Git repository. It also presents typical
use cases in Markdown usage and presents some tricks."
---
```{r setup,comment=FALSE, message = FALSE, echo=FALSE,warning=FALSE}
rm(list=ls()) # Clean the environment
options(scipen=6) # display digits proberly!! not the scientific version
options(digits.secs=6) # use milliseconds in Date/Time data types
options(warning=FALSE) # don't show warnings
library(knitr) # set global knitr options of the document
# Here we set the R code invisible to not show when the document is rendered
opts_chunk$set(comment="", message = FALSE, echo=FALSE, error=FALSE, warning=FALSE)
```
```{r directorySetup,comment=FALSE, message = FALSE, echo=FALSE,warning=FALSE}
dataFolder <- file.path("data") # Set path to the data and figure folder
figureFolder <- file.path("figures")
RFolder <- file.path("R") # Set the path to the RScript folder
# Load the R libraries
l<-lapply(list("plyr","reshape2","ggmap","ggplot2","scales","rgdal","maptools","tools"), require, character.only = TRUE)
```
## Introduction
This example publication serves as a motivation on how to create reproducible documents in R and aims to promote reproducible research in general.
## State of the Art
Various authors in qualitative and quantitive research argue that as many parts of the research workflow as possible should be reproducible. @Brunsdon2015 state "Reproducible quantitative research is research that has been documented sufficiently rigorously that a third party can replicate any quantitative results that arise".
To further motivate you, read [@Healy2016;@Leveque2012;@Baker2016;@Nature2016;@Pebesma2012;@Vandewalle2012;@Nuest2011;@Buckheit1995;@Healy2011] or the short and to the point editorial from @Nature2016.
## Case Study: Parc Adula
This case study presents a small subset of data from a current study conducted at the Department of Geography at the University of Zurich. The study investigates social negotiations revolving around a national park project in Switzerland – _Parc Adula_ – and aims for a better understanding of how people reason in a public environmental debate.
### Exploratory topic analysis
For this case study, 16 interviews were carried out. Each of these semi-structured interviews was analyzed resorting to Mayring's qualitative content analysis (@Mayring2010) – resulting in a code system, which was derived through mainly inductive category development. The following plots display sample output from the MXAQDA software for qualitative data analysis.
Overview of the interviews and representatives:
* _Cantonal Goverment_ (n: 4): Representatives from four different departments
* _Environmental Organisation_ (n: 1): Representative from a specific interest group with advisory function.
* _Federal Goverment_ (n: 2): Must ensure the park follows the laws and decrees
* _Local_ (n: 5): Local representatives of the park region
* _Parc Team_ (n: 2): Team members involved in the park planning process
* _Tourism_ (n: 2): Local tourism representatives
The following plot presents the frequency of occurence of a select list of topics that occured in the interviews. While there seems to be more focus on the _Pro Argument_ against the _Contra Argument_ during the interviews, topics on _Tourismus_ seem to have far more weight than those on _Biodiversität_.
```{r read}
d <- read.csv(file.path(dataFolder,"interview_data.csv"),sep=";",stringsAsFactors = FALSE)
names(d) <- gsub("[.]"," ", names(d))
d <- cbind(d[,1:4],d[ , order(names(d[,5:ncol(d)]),decreasing = TRUE)+4]) # column order hack
d <- ddply(d, .(ParentCode), numcolwise(sum)) # summarise variable
d.m <- melt(d)
```
```{r tableTopics}
ds <-data.frame(Code=d$ParentCode,Mention=rowSums(d[,2:ncol(d)]))
knitr::kable(ds, caption = "Topic mentions.")
```
The next figure presents the frequency matrix of the topic occurences across the different interviews. It provides an overview of where topics are mentioned and by whom.
```{r plotfreq,fig.height=4,fig.width=6,fig.cap="Frequency matrix of a selected list of topics across the various representants"}
p <- ggplot(d.m, aes(x=ParentCode,variable)) + geom_tile(aes(fill = value), colour = "white") + scale_fill_gradient(low = "white", high = "steelblue")
p <- p + theme_bw() + labs(x = "Topic",y = "Participants", title="Frequency of topic occurences") + theme(axis.text.x = element_text(angle = 45, hjust = 0.5, vjust = 0.5, colour = "grey50"))+ scale_x_discrete(expand = c(0, 0))+ guides(fill=guide_legend(title="Topic \nfrequencies"))
p
```
**Notes on reproducibility:** Depending on the data to analyze, privacy may play a role. While for the analysis itself the data is being anonymised, storing the raw or preprocessed data on a public repository may pose privacy issues or even constitute a violation of contract.
### Google query timeline
```{r readGoogleTrend}
# https://www.google.com/trends/explore?date=all&q=parc%20adula
d <- read.csv(file.path(dataFolder,"google_trend.csv"),sep=",",stringsAsFactors = FALSE,skip = 2)
names(d)<- c("Month","Count")
d$Month <- as.Date(paste(d$Month,"-01",sep=""))
# keep only the queries after March 2007, seems that before the counting is weird..
d<-d[d$Month>as.Date("2007-03-01"),]
```
Overview of the Google trend evolution of the search query: _Parc Adula_ ([url](https://www.google.com/trends/explore?date=all&q=parc%20adula), provides a CSV file). The timeline shows overall a small amount of queries for this word combination, with a spike on `r d[which.max(d$Count),"Month"]`. This was retrieved on August 11, 2016.
**Notes on reproducibility:** Web APIs are subject to license restrictions, can get altered by the service provider, or can simply cease to exist, so consider them carefully before using them in a scientific project. Consider instead using software which you can store locally and can better control the parameters and settings. If collecting data from an API, ensure to note down as much as possible about the data collection: the date range, all the query parameters, including the service limits at the time, any interruptions in service, and so on. It's also wise to back up the data thus obtained, if at all possible!
```{r googleTrend,fig.height=2,fig.width=5,fig.cap="Timeline of queries for Parc Adula set in the Google search engine"}
ggplot( d, aes(Month, Count)) + geom_line() +labs(list(x = "", y = "Monthly Queries", title ="Google Trend queries for 'Parc Adula'"))+theme_bw()
```
### Case study area
The proposed Parc Adula national park candidate is situated in Switzerland in the border region of the cantons Ticino and Grisons (Graubünden). The map below presents the current outer perimiter of the planned national park.
**Notes on reproducibility:** Due to license restrictions of the open geodata, it is not possible to store the data on a public Git repository. The included script `R/loadMapData.r` downloads the data directly from the link provided in the geodata catalog infobox of http://maps.geo.admin.ch
```{r map,fig.width=4, out.width = "60%",fig.cap="Planned perimeter of Parc Adula, Switzerland, Data source: Swisstopo"}
# Download and extract Shapefiles from geo.admin.ch
# Due to license restrictions, we can't store the data on a public Git repository.
# https://map.geo.admin.ch/?topic=ech&lang=de&bgLayer=ch.swisstopo.pixelkarte-farbe&catalogNodes=458,532,639,653&layers=ch.swisstopo.swissboundaries3d-land-flaeche.fill,ch.swisstopo.swissboundaries3d-kanton-flaeche.fill,ch.bafu.schutzgebiete-paerke_nationaler_bedeutung&X=190000.00&Y=662000.00&zoom=1&layers_opacity=1,1,0.85
# Create the map if the map images does not exist
mapPng <- file.path(figureFolder,"map.png")
if(!file.exists(mapPng)){
# the map generation code is stored in an external R script to increase the readibility of this Rmd file
source(file.path(RFolder,"loadMapData.r"))
}else{knitr::include_graphics(mapPng)}
```
## Concluding discussion
This template is based on data from an ongoing research project and presents some typical examples of material that could be used in a publication written in RMarkdown. It shows how to include data and analyses, features plots, tables, literature, and various markdown elements. The generated files in _PDF_, _Word_ or _HTML_ often still need some fine-tuning afterwards (particularly in Latex). It is nevertheless a great way of documenting the research process, generating initial drafts, and sharing workflows with collaborators or a wider audience.
# Acknowledgements
The Reproducible Research workshop was supported by the InnoPool fund of the Department of Geography, University of Zurich.
```{r session_info, results='markup'}
# Session info (include it for your own reproducibility)
# devtools::session_info()
```
# References