-
Notifications
You must be signed in to change notification settings - Fork 19
/
index.Rmd
596 lines (389 loc) · 21 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
---
title: "Home"
layout: default
---
```{r echo=FALSE}
knitr::opts_chunk$set(
comment = "#>"
)
```
```{r echo=FALSE}
# install pkgs if not installed already
invisible(lapply(c('magrittr', 'httr', 'rvest', 'cowsay'), function(x) {
if (!requireNamespace(x, quietly = TRUE)) install.packages(x, quiet = TRUE)
}))
```
## An intro to R for new programmers
This is an introduction to R. I promise this will be fun. Since you have never used a programming language before, or any language for that matter, you won't be tainted by other programming languages with different ways of doing things. This is good - we can teach you the R way of doing things.
<img src="/assets/img/programmer.png" width="300">
## jsforcats?
Yep, this is a total rip off of [JSforcats.com](http://jsforcats.com) - hopefully [Max](http://maxogden.com/) doesn't mind.
## List of things
* [Using the R console - let's dig our claws in](#rconsole)
* [vectors - the basic R data structure](#vectors)
* [data frames - weird but familiar](#dataframes)
* [lists](#lists)
* [indexing](#indexing)
* [functions](#functions)
* [pipes](#pipes)
* [Using packages](#packages)
* [No no's for cats using R](#nonos)
* [Do do's for cats - or things to do](#dodos)
* [Open data from the web! Cat's love open data](#data)
* [Reading](#reading)
* [Does this site suck?](#makeitbetter)
## <a href="#rconsole" name="rconsole">#</a> R console
Writing code is fun. Since you're a cat, not having opposable thumbs may be a bit of an issue, but surely you're clever enough to find a way around that.
So open up R, and you'll see something like this:
<img src="/assets/img/console.png" width="550" border="1">
You can do math:
```{r, background='#dfe3e3', collapse=TRUE}
1 + 1
```
Strings are always fun to start with, type a set of letters together within quotes and the console will print it back to you
```{r, collapse=TRUE}
"Hello Mr Tickles"
"This is a string"
```
Double quotes and single quotes are more or less interchangable, but is better practice to stick with double quotes.
Another thing you'll want to do as a cat using R is assign things to a name so that you can use it later. Think of this as being if you were a chipmunk and you buried a nut in the ground to dig up later. You can assign anything in R to a name, then use it later (in the current R session of course :)).
Assign the number 5 to the name `mynumber`
```{r, collapse=TRUE}
mynumber <- 5
```
Later you can use `mynumber`, like adding it to another number
```{r, collapse=TRUE}
mynumber + 1
```
Sweet!
## <a href="#vectors" name="vectors">#</a> Vectors
Vectors are one of the simplest and common objects in R. Think of a vector like a cat's tail. Some are short. Some are long. But they are are pretty much the same width - that is, they can only contain a single data type. So a vector can only have all `numeric`, all `character`, all `factor`, etc.
But wait, how do we make a vector? The easiest way is to use a function called `c`. So `c(5,6,7)` will create a vector of numbers 5, 6, and 7. Let's try to put a `character` and a `numeric` together.
```{r, collapse=TRUE}
c("hello", 5)
```
Notice how the output of the above converted the 5 to a character type with quotes around the `5` to make it `"5"`, i.e., or an object of type _character_.
But we can happily make a vector of the same type of information, like
```{r, collapse=TRUE}
c(5, 8, 200, 1, 1.5, 0.9)
```
Vectors are handy because they can be combined to make other R objects, such as lists (see [lists](#lists) below), and [data frames](#dataframes).
In addition, you can do something to each part of the vector. Let's say you have a vector of three types of animals:
```{r, collapse=TRUE}
animals <- c("birds", "squirrels", "fish")
```
You can add something to each of them like
```{r, collapse=TRUE}
paste(animals, "are silly")
```
## <a href="#dataframes" name="dataframes">#</a> Data frames
A `data.frame` is one of the most commonly used objects in R. Just think of a `data.frame` like a table, or a spreadsheet, with rows and columns and numbers, text, etc. in the cells. A very special thing about the `data.frame` in R is that it can handle multiple types of data - that is, each column can have a different type. Like in the below table the first column is of `numeric` type, the second a `factor`, and the third `character`.
```{r tidy=FALSE, collapse=TRUE}
df <- data.frame(hey = c(5, 6, 7),
there = as.factor(c("a", "b", "c")),
fella = c("blue", "brown", "green"))
```
```{r collapse=TRUE}
df
```
Notice that the first _column_ of numbers are actually row names, and are not part of the `data.frame` _per se_, though are part of the _metadata_ for the `data.frame`.
We can quickly get a sense for the type of data in the `df` object by using the function `str`, which gives information on the types of data in each column.
```{r, collapse=TRUE}
str(df)
```
__Matrices__
Think of a matrix in R like a `data.frame` with all the same type of data, only numeric, only character, etc. A matrix is technically a special case of a two-dimensional array.
```{r tidy=FALSE, collapse=TRUE}
mat <- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2, ncol = 3)
```
```{r collapse=TRUE}
mat
```
## <a href="#lists" name="lists">#</a> Lists
Lists are sorta crazy. They are kinda like vectors, but kinda not. Using our cat tail analogy again, lists are like cat tails in that they can be short or long, but they can also vary in width. That is, they can hold any type of object. Whereas vectors can only hold one type of object (only `character` for example), lists can hold for example, a `data.frame` and a `numeric`, or a `data.frame` and another `list`! The way we make a list is via the function `list`
```{r list1, collapse=TRUE}
list(1, "a")
```
A nested list
```{r list2, collapse=TRUE}
mylist <- list(1, list("a", "b", "c"))
mylist
```
Just like vectors, you can do operations on each element of the list. However, since lists can be nested you have to worry about what level of nesting you want to manipulate.
For example, if we take the `mylist` list from above, and perform the following:
```{r list3, collapse=TRUE}
length(mylist[1])
length(mylist[2])
```
This gives a length of 1 for each element of the list. But wait, aren't there three things in the second slot of the list ("a", "b", "c")? Indeed there are
```{r list4, collapse=TRUE}
length(mylist[2][[1]])
```
## <a href="#indexing" name="indexing">#</a> Indexing
Okay, so let's say you have made a `vector`, `list`, or `data.frame`. How do you get to the things in them? Its slightly different for each one.
There is a general way to index objects in R that can be used across `vectors`, `lists`, and `data.frame`. That is the square bracket: `[]`. For some objects you can index by the sequence number (e.g., `5`) of the thing you want, while with others you can do that, but also index by the character name of the thing (e.g., `kitty`).
**vectors**
Vectors only have one dimension, as we said above. So with `[]` there is only one number to give here. For example, let's say we have the vector
```{r, collapse=TRUE}
bb <- c(5, 6, 7)
```
We can index to each of those 3 numbers by the sequence of its place in the vector. Get the 6 by doing
```{r, collapse=TRUE}
bb[2]
```
You can also have a named vector. What's that? A named vector is like `bb` above, but each of the three elements has a name.
```{r, collapse=TRUE}
bb <- c(5, 6, 7)
names(bb) <- c("hey", "hello", "wadup")
bb
names(bb)
```
With a named vector we can get to each element in the vector using its name with a single set, or double set of brackets to get the value, or the value and name, respectively.
```{r, collapse=TRUE}
bb["hello"]
```
```{r, collapse=TRUE}
bb[["hello"]]
```
Fun.
**lists**
Indexing on lists is similar to vectors. A huge difference though is that lists can be nested. So there could be infinite things within each slot of a list.
For example, let's say we have the nested list from above `mylist`
```{r, collapse=TRUE}
mylist <- list(foo = 1, bar = list("a", "b", "c"))
```
We can index to the first item in the list, including its name, by
```{r, collapse=TRUE}
mylist[1]
```
Or equivalently
```{r, collapse=TRUE}
mylist["foo"]
```
And get just the value by using two `[`
```{r, collapse=TRUE}
mylist[[1]]
```
Or equivalently
```{r, collapse=TRUE}
mylist[["foo"]]
```
And get the second item in `mylist` by
```{r, collapse=TRUE}
mylist[2] # or mylist["bar"]
```
```{r, collapse=TRUE}
mylist[[2]] # or mylist[["bar"]]
```
And get to the individual elements within `bar` by
```{r, collapse=TRUE}
mylist[[2]][1]
```
And so on to get to what you need.
There are a number of convenience functions to make working with lists easier, but you can learn about those later.
**data.frame and matrix**
Indexing on a `data.frame` and `matrix` is similar. Both have two things to index on: rows and columns. Within `[,]`, the part before the comma is for rows, and the part after the comma for columns. So if you have a data frame `iris` in R,
```{r, collapse=TRUE}
head(iris)
```
You can index to the third row and second column by doing
```{r, collapse=TRUE}
iris[3, 2]
```
You can also use names to index if you have named rows or columns. For example,
```{r, collapse=TRUE}
iris[2, "Species"]
```
You can select the entire second row like this
```{r, collapse=TRUE}
iris[2, ]
```
If now your data frame is `mtcars`, you can get its first column by doing
```{r, collapse=TRUE}
mtcars[, 1]
```
You can also use the `$` symbol to index to a column, like
```{r, collapse=TRUE}
mtcars$mpg
```
## <a href="#functions" name="functions">#</a> Functions
Cats are the type of feline to love functions. Functions make your life easier by allowing you to generalize many lines of code, and avoiding repeating yourself. Functions make your work tidier - just like cats like it.
Functions are written like this
```{r}
foo <- function(){
writeLines("I strongly dislike dogs")
}
```
After defining this function we can then call it later like this
```{r fxn1}
foo()
```
Yay! Dumb dogs.
The `foo` function was pretty simple. We can also pass in parameters to the function.
```{r fxn2}
foo <- function(mess){
writeLines(mess)
}
```
```{r fxn23, collapse=TRUE}
foo("I strongly dislike dogs")
```
And set parameters to default values.
```{r fxn3}
foo <- function(mess = "I strongly dislike dogs"){
writeLines(mess)
}
```
```{r fxn4, collapse=TRUE}
foo()
foo("Well, I dislike most dogs, but I like the one in my house")
```
Generally, if you are writing more than 3 lines of code to do any particular task you may as well write a function to do that task, making it reusable and (hopefully) more general. For more justification for this Google the _DRY principle_.
## <a href="#packages" name="packages">#</a> Using packages
A lot of the functionality in R is in extensions to the language, called packages. Since you're a cat, you can think of packages like boxes that you put a bunch of code in. Since you are putting code in this box you probably don't want to sit in it :). These boxes generally hold a similar set of functions (see [functions](#functions) above). A package allows you and others to
* Easily install and load the code
* Incorporate documentation
* Lessen conflicts with functions in other packages (don't worry about why for now, but if you want to know [go here](http://adv-r.had.co.nz/Namespaces.html))
Most people that make R packages share them on a site on the interwebs called [CRAN](https://cran.rstudio.com/) (don't worry about what it stands for).
The humans behind CRAN have done a good job making sure that in most cases packages you install from CRAN will work on your computer, whether Linux, Windows, or OS X.
Installation is super easy. Do `install.packages("package_name")`, where `package_name` is the name of the package you want to install. Remember that the package name is case sensitive! Or if you're using RStudio you can go to the _Packages_ pane.
Once the package is installed you have to load the package in to your R session. That's easy too! Do `library('package_name')`, or if you're in RStudio go to the _Packages_ pane.
_Note: Package creation is out of scope for this site, but Hadley has made it much easier with [devtools](https://github.com/hadley/devtools)._
## <a href="#pipes" name="pipes">#</a> Pipes
<img src="/assets/img/pipe_gif.gif" width="300">
Pipes have only been recently introduced in base R, but are a concept you should know about as a cat.
The R package [magrittr](https://cran.rstudio.com/web/packages/magrittr) introduced pipe operators. There are many, but we'll just focus on `%>%`. This allows you to pipe a value forward into an expression or function call, like `x %>% f`, rather than `f(x)`, if `f()` is a function.
First, let's install `magrittr`:
```{r eval=FALSE}
install.packages("magrittr")
```
Then load the package so we can use it in the R session:
```{r}
library("magrittr")
```
Why should you care about this? This concept can make writing code and understanding it simpler. Take for example this case:
```{r}
mtcars %>% subset(cyl < 6) %>% head()
```
If you have a version of R recent enough you can use the base pipe `|>` like so:
```{r}
mtcars |> subset(cyl < 6) |> head()
```
Alternatively, we could avoid pipes:
```{r}
head( subset(mtcars, cyl < 6) )
```
Both solutions above give the same result, but the first with pipes is more intuitive. That is, it starts with the thing we're going to manipulate (the `mtcars` data.frame), then subsets the data.frame to take rows with the `cyl` column having values less than 6, then take the head, or first six rows. In the second solution, we have to read from the inside out to find the thing that we are manipulating (`mtcars`), which is then subsetted, then the first six rows are given. This is a simple example - as complexity grows, using pipes can make understanding and iterating on code much easier.
Pipe are increasingly being integrated into various R packages. One of the more popular ones is [dplyr](https://cran.rstudio.com/web/packages/dplyr), which allows you to manipulate data frames.
## <a href="#nonos" name="nonos">#</a> No no's for cats using R
There are a few R gotchas to avoid cat friends.
* `attach()`: This attaches an object to the search path. Like fleas, these can be hard to live with and harder to get rid of.
* When doing `library()` or `require()` use the package name in quotes as package name without quotes is easier, but can cause problems.
* Make sure to keep your claws trimmed so they don't get stuck in spaces between the keys.
## <a href="#dodos" name="dodos">#</a> Do do's for cats using R
* Do combine code and text with `Markdown` or `LaTeX` to have reproducible documents, using `knitr`.
* When googling for R help, use `cran`, not `r`
* When asking questions on Twitter/Appdotnet/G+/etc. use `#rstats`
* Do ask lots of questions on StackOverflow (use the `[r]` tag), Twitter (does this need saying), etc. But make sure to do your research before asking, and [include a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)!
* `.rda`, `.rds`, `.RData` compressed files are useful, but for reproducibility purposes, use non-R specific file formats like `.csv`, `.xml`, `.json`, etc.
## <a href="#data" name="data">#</a> Data from the web: pictures
This is sort of an advanced R topic, but hey, how else do we get cute cat pictures? That's the point of the internet after all - to serve cat pictures.
Install `cowsay`
```{r data1, eval=FALSE}
install.packages("cowsay")
```
Now let's get a cat fact!
```{r data2, messages=FALSE, warnings=FALSE}
library("cowsay", quietly = TRUE, warn.conflicts = FALSE)
say("catfact", "cat")
```
A little explanation is in order me thinks. There are a few things going on in the last thing we just did. The `say` function looks like sorta like this:
```r
say <- function(what, by, type){
<== some ascii art ==>
url <- "http://catfacts-api.appspot.com/api/facts?number=1"
what <- fromJSON(url)$facts
message(sprintf(by, what))
}
```
The first line is a bunch of ascii characters to make a cat. The second line defines the url for the cat facts API. The third line retrieves one cat fact from the cat facts API. And the fourth line prints the messages with a cat.
> But what is an API? I'm a cat, I only drink water or milk (preferably milk) - but at least I've heard of an IPA. What the rat's ass (yum) is an API?
Okay, here goes. An API stands for Application Programming Interface. It's just a set of instructions for two computers to talk to each other. It's sorta like if you run into another cat and if you both knew beforehand a lot about each other, you would have a sense for how to behave - if you don't know each other, then best of luck to you Mr. Tickles.
**Another example**
Above, we learned about writing functions. Let's create a function to generate a markdown image link for a cute cat in various sizes (by pixels) from the website [placekitten.com](http://placekitten.com/). This is a awesome, but simple web service. You just use ask for the image size by adding two numbers at the end. For example, `http://placekitten.com/g/300/300` gives an image that is 300 by 300 pixels.
```{r, tidy=FALSE}
getcutecat <- function(x){
writeLines(
sprintf("![](http://placekitten.com/g/%s/%s)", x, x)
)
}
```
150 pixels
```{r, collapse=TRUE}
getcutecat(150)
```
<img src="/assets/img/150.jpeg" width="150">
250 pixels
```{r, collapse=TRUE}
getcutecat(250)
```
<img src="/assets/img/250.jpeg" width="250">
## <a href="#doge" name="doge">#</a> ***Hey! R is not just for cats!*** - dogs
R and web APIs are open to all sorts of animals. For instance, the [dogr.io](http://dogr.io) service will create a "doge" style meme for you when provided with text. Let's write another function to call this API:
```{r dodge1, tidy=FALSE}
getdoge <- function(x){
x = paste(x, collapse="/")
writeLines(
sprintf("![](http://dogr.io/%s.png)", x)
)
}
```
```{r dodge2, collapse=TRUE}
getdoge(c("wow", "suchhostility", "bemoreinclusive"))
```
<img src="/assets/img/bemoreinclusive.png" width="400"/>
***Now, back to your regular, cat-dominated tutorial.***
## <a href="#dataNames" name="dataNames">#</a> Data from the web: names
Besides pictures, R is also really good at scraping data from websites! Here, we'll look at the names of famous cats that are listed on Wikipedia.
First, we'll have to load some helpful packages.
```{r message=FALSE}
library("httr")
library("rvest")
```
Each of these packages will help us do different things. The httr package has really helpful functions for grabbing the data from websites, and the `XML` package can translate those webpages into useful objects in our environment.
Here, we'll use the `GET()` function from the `httr` package to save the source code of the page in an object.
```{r getPage}
wikiCatNamePage <- GET("https://en.wikipedia.org/wiki/List_of_fictional_felines")
```
Cool! We have the website in an object!
But...looking at `wikiCatNamePage`, it doesn't seem like R will understand what we want to do! This is where we use the `content()` function from the httr package--it's really good at translating the page so R can use it. That's what we'll do in this next command:
```{r htmlParse}
parsedWikiCatNamePage <- httr::content(wikiCatNamePage)
```
Alright! Now we've translated the website code so that a different function, `html_table()`, will be able to pull the tables that we want.
```{r tableParse}
wikiCatNameTables <- html_table(parsedWikiCatNamePage, fill = TRUE)
```
It looks like the function stored 13 different things in the object, `wikiCatNameTables`. Looking at the webpage, I like the table from the subsection called _In animation_ (there is even a full page about [fictional cats in animation](https://en.wikipedia.org/wiki/List_of_fictional_cats_in_animation)). My favorite is Tom, from Tom and Jerry!
If we use `str(wikiCatNameTables)`, we'll see that `wikiCatNameTables` is a list with six items. It looks like the sixth item is the one with our data--we're almost done! The `str()` function shows too much information because of how `wikiCatNameTables` is organized, so the output isn't shown here. You should try it, though!
Let's put the sixth item in `wikiCatNameTables` into an object and see what we get:
```{r makeFamous, echo = FALSE}
famousCats <- as.data.frame(wikiCatNameTables[[6]])
head(famousCats)
```
```r
famousCats <- wikiCatNameTables[[6]]
head(famousCats)
```
Wow! We did it! And the data looks great! Give yourself a pat on the back!
If you're interested in learning more about scraping webpages, you should check out tutorials for `rvest` and `xml2` packages. They are designed to be as cat-friendly as possible!
## <a href="#reading" name="reading">#</a> Reading
After this basic intro you'll want to do more, such as:
* What do you think should go here? say so [here](https://github.com/sckott/rforcats/issues)
And for even more advanced R:
* [Advanced R, by Hadley Wickham](http://adv-r.had.co.nz/)
## <a href="#makeitbetter" name="makeitbetter">#</a> Make it better
Contribute by following [these instructions](/Contribute).
## <a href="#catslover" name="catslover">#</a> Cat's love R
<img src="/assets/img/leo_giffed.gif" width="300">
## <a href="#license" name="license">#</a> License