-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathsection1.Rmd
274 lines (235 loc) · 9.13 KB
/
section1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
---
title: 'Introduction to R^[`r library(r2symbols) ; format(paste(symbol("copyright") , " - Wim R.M. Cardoen, 2022 - The content can neither be copied nor distributed without the **explicit** permission of the author."))`]'
subtitle: 'Part 1: Overview of R'
author: 'Wim R.M. Cardoen'
fontsize: 11pt
date: "Last updated: `r format(Sys.time(), ' %2m/%2d/%Y @ %2H:%2M:%2S')`"
output:
pdf_document:
latex_engine: xelatex
highlight: tango
df_print: tibble
toc: True
toc_depth: 5
number_sections: True
extra_dependencies:
- amsfonts
- amsmath
- xcolor
- hyperref
fig_caption: True
citecolor: green
linkcolor: red
urlcolor: magenta
---
\newpage
# R: What's in the name?
* Scripting language (vs. compiled language)
* Data Analysis Environment
# Why R?
* Open Source
* Scripting language => rapid prototyping
* The same R-code can run on different OS (MS, MacOs, Linux)
* Most diverse set of statistical tools
* A lot of pre-canned [packages/libraries](https://cran.r-project.org/web/packages/)
* Large community (and very active development)
* Job "security" (due to its popularity)
- [IEEE Top Programming Languages 2024](https://spectrum.ieee.org/top-programming-languages-2024)
- [IEEE Top Programming Languages 2023](https://spectrum.ieee.org/the-top-programming-languages-2023)
# From S to R
$\texttt{People are trapped in history and history is trapped in them. (James Baldwin)}$
* **S** programming language: created at Bell Labs (1975-1976). \newline Before, statistical computing was mainly done by calling Fortran routines.
* Richard A. Becker
* *[John M. Chambers](https://statweb.stanford.edu/~jmc4/) *
* Allan R. Wilks
* 1988: ["New S" (aka "Blue Book" by Becker, Chambers & Wilks)](https://www.amazon.com/New-S-Language-R-Becker/dp/0534091938) introduced
a lot of new features:
* functions (instead of macros)
* double precision arithmetic
* interface to Fortan, C[^1]
* etc.
* became *similar* to the modern versions of S-PLUS and R.
* 1991: *Statistical Models in S* (Chambers & [Hastie](https://hastie.su.domains/))
* formula-notation (~ operator)
* data frames
* methods and classes (S3)
* 1998: S4 (latest version of S)
* more advanced object-oriented features ([S4](http://adv-r.had.co.nz/S4.html) - quite different from [S3](http://adv-r.had.co.nz/S3.html))
(inheritance & multiple dispatch)
* 1991: [**R**oss Ihaka](https://www.stat.auckland.ac.nz/~ihaka/) and
[**R**obert Gentleman](https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)) (Auckland, New Zealand)
* started their own implementation of S and named it ... **R**.
(*"a free implementation of something 'close to' version 3 of the S language"*)
* borrowed features from [Scheme](https://groups.csail.mit.edu/mac/projects/scheme/) (static/lexical scoping).
* 1995: R becomes open-source.
* 2000: Release of R v.1.0.
* 2003: Start of the R Foundation.
* **S** still exists as a commercial product (**S-PLUS**). It is currently owned by TIBCO Software Inc.
## Links
* [Becker, Richard A (1994). A Brief History of S.](https://web.archive.org/web/20150723044213/http://www2.research.att.com/areas/stat/doc/94.11.ps)
* [Chambers, John M. (2009). Software for Data Analysis: Programming with R (Statistics and Computing)](https://www.amazon.com/Software-Data-Analysis-Programming-Statistics/dp/0387759352)
* [Chambers, John M. (2020). S, R, and data science.](https://dl.acm.org/doi/pdf/10.1145/3386334)
* [The R Project for Statistical Computing](https://www.r-project.org/)
* [https://www.r-bloggers.com/](https://www.r-bloggers.com/)
# Using R
* interactively: **R** CLI or in an IDE.
* run as a script containing R code.
* the cloud: e.g. [https://rstudio.cloud/](https://rstudio.cloud/)
## Links
* [Pre-compiled R Binaries (CRAN)](https://cran.r-project.org/)
* [RStudio IDE](https://posit.co/downloads/)
* [Latest R Source Code](https://cran.r-project.org/src/base/R-latest.tar.gz)
# Getting help on R
* Within R itself:
* $\textcolor{blue}{\textbf{help(cmd)}}$
* $\textcolor{blue}{\textbf{help(?cmd)}}$
* $\textcolor{blue}{\textbf{help(??cmd)}}$
* some keywords require quotes e.g. $\textcolor{blue}{\textbf{?'if'}}$, $\textcolor{blue}{\textbf{?'for'}}$
```{R, eval=FALSE}
help(mean)
?mean()
??mean()
?'if'
```
* External to the R package:
* [The Comprehensive R Archive Network](https://cran.r-project.org/)
* [An Introduction to R](https://cran.r-project.org/doc/manuals/R-intro.pdf)
* [R Language Definition](https://cran.r-project.org/doc/manuals/R-lang.pdf)
* [The Art of R Programming
(Normam Matloff)](https://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf)
* [Hadley Wickham's Website](https://hadley.nz/)
* [R-Bloggers](https://www.r-bloggers.com/)
* [R Mailing List](https://www.r-project.org/mail.html)
* [Stack Overflow (R)](https://stackoverflow.com/questions/tagged/r)
# General statements on R
* R is **case-sensitive** (cfr. C, C++, Python) => 'a' is different from 'A'.
* Symbol names/variables: letters, digits, ., _ are allowed.
* **must** start with a **.** (dot) or a **letter**.
* if symbol starts with **.** (dot), then second character **must** be a letter.
* can't use **reserved** keywords ( $\textcolor{blue}{\textbf{?Reserved}}$), i.e.:
- $\textcolor{blue}{\textbf{if, else, repeat, while, function, for, in, next, break}}$
- $\textcolor{blue}{\textbf{TRUE, FALSE, NULL, Inf, NaN, NA}}$
- $\textcolor{blue}{\textbf{NA\_integer\_,NA\_real\_,NA\_complex\_,NA\_character\_}}$
* the aforementioned rules can be overriden by the use of **backticks**, e.g.:
- $\textcolor{blue}{\textbf{`if`}}$
- $\textcolor{blue}{\textbf{`\_myvar`}}$
* R commands are separated by:
* either $\textcolor{blue}{\textbf{;}}$ (semi colon).
* or newline.
* R commands can be grouped together using $\textcolor{blue}{\textbf{\{\}}}$ (curly braces).
* R comments: start with $\textcolor{blue}{\textbf{\#}}$ (pound sign) (limited to the same line)
* History: use **arrow up** & **arrow down** to retrieve previous cmds.
* <font color="blue"><b>.Rhistory</b></font>: contains the recently used R commands
* <font color="blue"><b>.RData</b></font>: contains the objects stored in the Global Environment
\newpage
$\texttt{Oefening baart kunst! ("Exercise gives birth to art!" - Dutch proverb)}$
$\newline$
# Exercises: A taste of R
* Start R or RStudio
* Try out the following commands in R.
## Check R's version, system time and working directory
```{R, eval=FALSE}
print(R.version.string)
Sys.time()
getwd()
list.files()
```
$\textcolor{orange}{\textbf{Note:}}$\newline
You can invoke operating systems (OS) commands **directly** \newline
using the $\textcolor{blue}{\textbf{system}}$ function to obtain
the information above, i.e.:
```{R, eval=FALSE}
system("R --version | head -n 1 | cut -d' ' -f1-4")
system("date +'%Y-%m-%d %T %Z'")
system("pwd")
system("ls")
```
## Creation of variables
```{R,eval=FALSE,results='asis'}
x1 <- 5
x1
typeof(x1)
ls()
```
```{R,eval=FALSE}
x2 <- 5L
x2
typeof(x2)
ls()
```
```{R,eval=FALSE}
x3 <- FALSE
x3
typeof(x3)
```
```{R,eval=FALSE}
x4 <- "Programming is cool"
x4
typeof(x4)
```
```{R,eval=FALSE}
x5 <- -15
typeof(x5)
sqrt(x5)
```
```{R,eval=FALSE}
x6 <- -15+0i
typeof(x6)
sqrt(x6)
```
## Tinkering with vectors
```{R,eval=FALSE}
y1 <- 1:5
y1
length(y1)
```
```{R,eval=FALSE}
y2 <- seq(10.,50,by=10)
y2
length(y2)
y2^y1
```
```{R,eval=FALSE}
y3 <- c("Hello", "world", "!")
y3
s <- paste(y3)
s
```
```{R,eval=FALSE}
y4 <- c(TRUE,FALSE,TRUE)
sum(y4)
```
## Install the $\texttt{lobstr}$ package within R
```{R,eval=FALSE}
install.packages(pkgs=c("lobstr"),
repos=c("http://cran.us.r-project.org"), verbose= TRUE)
packageVersion("lobstr") # Check version of the package installed
```
Info on the $\texttt{lobstr}$ package can be found at:
- [cran.r-project.org website](https://cran.r-project.org/web/packages/lobstr/index.html)
- [lobstr website](https://lobstr.r-lib.org/)
## Use the newly installated package
* approach 1:
```{R,eval=FALSE}
x <- 1:10
lobstr::obj_addr(x)
lobstr::ref(x, character=TRUE)
cat(sprintf("Size of the object (Bytes):%s\n", lobstr::obj_size(x)))
```
* approach 2 (more common):
```{R,eval=FALSE}
x <- 1:10
library(lobstr)
obj_addr(x)
ref(x, character=TRUE)
cat(sprintf("Size of the object (Bytes):%s\n", obj_size(x)))
```
## Removing objects from your environment
```{R,eval=FALSE}
ls()
rm(c("y1","y2","y3","y4"))
ls()
```
[^1]:
'Function <font color="blue"><b>mode</b></font> gives information about the mode of an object in the sense of Becker, Chambers & Wilks (1988), and is more compatible with other implementations of the S language. Finally, the function <font color="blue"><b>storage.mode</b></font> returns the storage mode of its argument in the sense of Becker et al. (1988). It is generally used when calling functions written in another language, such as C or FORTRAN, to ensure that R objects have the data type expected by the routine being called. (In the S language, vectors with integer or real values are both of mode "numeric",
so their storage modes need to be distinguished.)' [R Language Definition, v.4.1.2 (2021-11-01) p. 2](https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf)