-
Notifications
You must be signed in to change notification settings - Fork 1
/
Lesson7.Rmd
158 lines (127 loc) · 5.23 KB
/
Lesson7.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
params:
lesson: "Lesson 7"
title: "Import diverse data files and structures"
bookchapter_name: "Cheat sheet for the `readr` package"
bookchapter_section: "https://readr.tidyverse.org/"
functions: "`read_csv`, `read_delim`, `write_csv`"
packages: "`readr`"
# end inputs ---------------------------------------------------------------
header-includes: \usepackage{float}
always_allow_html: yes
output:
html_document:
code_folding: show
---
```{r, setup, echo = FALSE, cache = FALSE, include = FALSE}
options(width=100)
knitr::opts_chunk$set(
eval = T, # run all code
echo = TRUE, # show code chunks in output
tidy = TRUE, # make output as tidy
message = FALSE, # mask all messages
warning = FALSE, # mask all warnings
comment = "",
tidy.opts=list(width.cutoff=100), # set width of code chunks in output
size="small" # set code chunk size
)
```
\
<!-- install packages -->
```{r, load packages, eval=T, include=T, cache=F, message=F, warning=F, results='hide',echo=F}
packages <- c("ggplot2","ggthemes","dplyr","tidyverse","zoo","RColorBrewer","viridis","plyr")
if (require(packages)) {
install.packages(packages,dependencies = T)
require(packages)
# load tvthemes
devtools::install_github("Ryo-N7/tvthemes")
}
lapply(packages,library,character.only=T)
```
<!-- ____________________________________________________________________________ -->
<!-- ____________________________________________________________________________ -->
<!-- ____________________________________________________________________________ -->
<!-- start body -->
# `r paste0(params$lesson,": ",params$title)`
\
Functions for `r params$lesson`
`r params$functions`
\
Packages for `r params$lesson`
`r params$packages`
\
# Agenda
Use the `readr` package to easily read in different data file types.
[`r params$bookchapter_name`](`r params$bookchapter_section`).
\
<!-- ----------------------- image --------------------------- -->
<div align="center">
<img src="img/readr.png" style=width:50%>
</div>
<!-- ----------------------- image --------------------------- -->
\
<!-- end yaml template------------------------------------------------------- -->
# Do First
Recreate the below PDF using `RMarkdown` with the following conditions from the smaller NYC Airbnb dataset. [Download the final PDF file here](https://raw.githubusercontent.com/darwinanddavis/EmoRyCodingClub/master/Lesson7_dofirst.pdf).
```{r}
# smaller csv file (16 cols)
url <- "http://data.insideairbnb.com/united-states/ny/new-york-city/2021-04-07/data/listings.csv.gz"
nyc <- readr::read_csv(url)
nyc <- nyc[nyc$id < 1000000,] # get smaller subet of data
```
* Accommodation less than $200 per night, between 5 and 15 nights, and excluding Staten Island.
* Show only the data structure of the above subsetted data as a code output. No need to show the code for how you subsetted the data (see the PDF).
* Show the plotting code along with the plot.
* Use the below `yaml` for your `Rmd` file.
```{yaml, echo=T, eval=F,message=F}
---
title: "Dissecting property availability in New York City using Airbnb open data"
author: "<your name here>"
urlcolor: blue
params:
source: "http://insideairbnb.com/new-york-city/"
output:
pdf_document:
toc: yes
toc_depth: 2
---
```
* Append the below code at the beginning of your `Rmd` file to load the appropriate packages and suppress the package loading messages and warnings. Use a `{r, echo=T, eval=T, message=F}` header in the code chunk.
```{r, echo=T, eval=T,message=F}
pacman::p_load(ggthemes,ggplot2,readr,dplyr)
```
<!-- ----------------------- image --------------------------- -->
![](img/lesson7_dofirst1.jpg) ![](img/lesson7_dofirst2.jpg)
<!-- ----------------------- image --------------------------- -->
\
*****
# Reading in different data file types
```{r, eval=F}
read_csv("file.csv") # read in csv
read_csv2("file2.csv") # read in csv data for ';' as separator and ',' as decimal point
read_delim("file.txt", delim = "|") # read flat txt files and specify the delimiter
read_tsv("file.tsv") # read flat data separated by tabs
read_table() # read data separated by white space, i.e. a table
write_file(x = "a,b,c\n1,2,3\n4,5,NA", path = "file.csv")
write_file(x = "a;b;c\n1;2;3\n4;5;NA", path = "file2.csv")
write_file(x = "a|b|c\n1|2|3\n4|5|NA", path = "file.txt")
write_file(x = "a b c\n1 2 3\n4 5 NA", path = "file.fwf")
write_file(x = "a\tb\tc\n1\t2\t3\n4\t5\tNA", path = "file.tsv")
```
\
# Saving your data from `R`
Save `x`, an `R` object, to `path`, a file path
```{r,eval=F}
# Comma delimited file
write_csv(x, path, na = "NA", append = FALSE, col_names = !append)
# File with arbitrary delimiter
write_delim(x, path, delim = " ", na = "NA", append = FALSE, col_names = !append)
# CSV for excel
write_excel_csv(x, path, na = "NA", append = FALSE, col_names = !append)
# String to file
write_file(x, path, append = FALSE) String vector to file, one element per line
write_lines(x,path, na = "NA", append = FALSE) Object to RDS file
write_rds(x, path, compress = c("none", "gz", "bz2", "xz"), ...)
# Tab delimited files
write_tsv(x, path, na = "NA", append = FALSE, col_names = !append)
```