forked from clayford/DataWranglingInR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdwr_00_intro_essentials.Rmd
256 lines (139 loc) · 8.65 KB
/
dwr_00_intro_essentials.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
title: "R and RStudio Essentials"
author: "Clay Ford"
date: "Spring 2016"
output: beamer_presentation
---
## Introducing R
- R is a language and environment for statistical computing and graphics
- Freely available and maintained by volunteers
- R is extensible; can be expanded by installing "packages" or by creating your own "functions"
- Frequently used with RStudio, a free IDE (integrated development environment) for R
## How to get R and RStudio
**R**: http://cran.r-project.org/ (or better yet, Google "Download R")
**RStudio**: http://www.rstudio.com/
Both are available for Windows, Mac, and Linux and free to install (no catches).
If you install R and RStudio, then you only need to run RStudio.
## Using R
- command-line driven (*very* little point-and-click)
- use "functions" to work with data
- can work interactively (one function at a time) or write a script to submit multiple functions at once
- requires time and effort to learn, but RStudio makes the process easier
## R basics - functions
R uses *functions* to perform operations.
Functions take *arguments* to specify how operations are performed.
Example: `read.csv(file="scores.csv")`
`read.csv` is a function to import a CSV file; `file` is an argument.
R has **many** functions. Fortunately about 20% of them do about 80% of the work.
## R basics - assignment
We often need to save a function's result or output. For this we use the assignment operator: ` <- `
For example, when you import a CSV file you need to give it a name:
`scoreData <- read.csv(file="scores.csv")`
Now we can use `scoreData` as an argument to other functions. For example: `summary(scoreData)` which computes summaries for each column in the data.
**Note**: Use `Alt + -` in RStudio to quickly insert ` <- `. You can also use `=` for assignment.
## R basics - objects
R creates and manipulates *objects*. Examples of objects are vectors, data frames, matrices and lists.
You can create as many objects as memory allows.
_create vector of character strings_:
`gender <- c("Male","Female","Male")`
_create vector of numbers_:
`x <- seq(0, 10, by=2)`
_create data frame_:
`df <- read.csv("scores.csv")`
## R basics - packages
All functions belong to *packages*. R comes with about 25 packages, but as of January 2016, there are *over 7700* user-contributed packages!
To use functions in a package, the package must be installed and loaded.
You only install a package once. You load a package whenever you want to use its functions.
Examples of packages that come with R:
- The `stats` package contains statistical functions
- The `foreign` package contains functions for reading data from other statistical programs
- The `graphics` package contains functions for creating graphics
## R in a Nutshell
**Use functions in various packages to create and manipulate objects**.
## Running R
When you have R and RStudio installed, you only need to run RStudio.
RStudio displays four panels:
1. Script
2. Console
3. Environment (what's in memory)
4. Help/Plots
These can be rearranged. (Tools...Global Options...Pane Layout)
## Creating an R Script
You can use R interactively, entering one line at a time in the console. But you'll eventually want to save that code and be able to run it again, or at least review it.
An R script is just a text file with a .R extension.
`Ctrl + Shift + N` will start a new script on a PC.
`Command + Shift + N` will start a new script on a Mac.
Be sure you know how to start and save an R script. I will ask you to turn in R scripts for assignments.
## Submitting code from a script
To submit code from a script to the console, place your cursor in the line of code you want to submit and hit `Ctrl + Enter`, or `Command + Enter` on a Mac.
You can also highlight multiple lines of code and use the same keyboard shortcut to submit the code. Tip: `Shift + down arrow` will highlight lines
To switch from script to console, type `Ctrl + 2`
To switch from console to script, type `Ctrl + 1`
## Comment your code
To add a comment to R code, enter a # followed by your comment. For example:
```{r}
# create vector of proportions
pr <- c(0.2,0.3,0.4,0.1)
```
RStudio displays comments in green font.
To quickly comment/uncomment a block of text or code, highlight it and hit `Ctrl+Shift+C`, or `Command+Shift+C` on a Mac.
To re-flow comments (ie, automatically add line breaks), hit `Ctrl+Shift+/`, or `Command+Shift+/` on a Mac
## Your working directory
It's imperative you understand the concept of a working directory.
Your working directory is where R looks for input files, and where it sends output files (unless you specify a path in your code). For example, the code:
`dat <- read.csv("file.csv")`
looks in your working directory for `file.csv`. For the code to work, you need to set your working directory to be the directory where that file is located.
## Setting your working directory - R code
The function to see your working directory: `getwd()`
The function to set your working directory is `setwd`. For example:
`setwd("~/_clients/Allen/")`
sets my working directory to the `Allen` folder, in the `_clients` folder, in _my computer's_ home directory.
The `~` means my computer's home directory. On Windows, this is usually `My Documents`.
## Setting your working directory - Completion
RStudio allows you to use Tab completion when working with directories and typing R code. Take advantage of this!
When setting your working directory:
- Type `setwd()` (notice RStudio automatically adds the closing parens)
- Type `""` in the parens (notice RStudio automatically adds the closing double-quote)
- Type `~/` in the double quotes and hit Tab; you should see a selection of available directories in your computer's home directory. (typing `/` and hitting Tab will list contents of the root directory.)
- arrow up and down to find the desired directory and hit Tab.
## Setting your working directory - RStudio
You can set your working directory via point-and-click in RStudio by hitting `Ctrl + Shift + H`.
This is fine for an exploratory session or following along in class. But when you write a script, you'll want to be able to have a line of code that sets your working directory for you.
In this class I will provide you data sets for lectures and assignments.
To follow along with lectures, you **must** know how to download the files and set your working directory to where you downloaded the files.
My suggestion is to create a folder for this course and then create a subfolder called `data`.
## RStudio function completion
As I mentioned earlier, RStudio can auto-complete your typing.
Start typing a command and hit `Tab`. All functions in currently loaded packages that begin with the letters you typed will appear in a pick-list.
After you typed your command and opened a parens, hit `Tab` again. In most cases (not all) you will get a pick-list of arguments for that function.
## Using Help
All R users occasionally need help. Every last one. Know how to use R Help.
`help(function)`
`?function`
A help page for a function has the name of the function at the top left next to the package it's in. For example:
`abline {graphics}`
The function `abline` is part of the _graphics_ package.
Use the `example()` function to run the examples in a help page: `example(abline)`
## Vignettes
Some packages come with _vignettes_, which are basically tutorials to help you get up and running with a package.
These will be available on the package's index (or home) page through the link _User guides, package vignettes and other documentation._
If no vignette is available, check the package _DESCRIPTION file._ It will sometimes have a URL that serves as the home page for the package.
## Other keyboard shortcuts
Insert section header: `Ctrl + Shift + R`
Run the current section: `Ctrl + Alt + T`
Clear the console: `Ctrl + L`
Quickly create an R Notebook (PDF file) of your R script and its output: `Ctrl + Shift + K`
More RStudio Keyboard Shortcuts:
`Alt + Shift + K`
## Books of Note
- _R in Action_, Robert I. Kabacoff
- _Data Manipulation using R_, Phil Spector
- _R for Everyone_, Jared Lander
- _R Cookbook_, Paul Teetor
## Web Sites to explore
- Contributed Documentation: http://cran.r-project.org/other-docs.html
- Quick-R: http://www.statmethods.net/
- Cookbook for R: http://www.cookbook-r.com/
- R-statistics blog: http://www.r-statistics.com/
- The R Journal: http://journal.r-project.org/
- UCLA R Starter Kit: http://www.ats.ucla.edu/stat/r/sk/