-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathsection2.Rmd
849 lines (681 loc) · 22.8 KB
/
section2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
---
title: 'Introduction to R^[`r library(r2symbols) ; format(paste(symbol("copyright") , " - Wim R.M. Cardoen, 2022 - The content can neither be copied nor distributed without the **explicit** permission of the author."))`]'
subtitle: 'Part 2: Atomic Data Types - Atomic/homogeneous vectors'
author: "Wim R.M. Cardoen"
fontsize: 11pt
date: "Last updated: `r format(Sys.time(), ' %2m/%2d/%Y @ %2H:%2M:%2S')`"
output:
pdf_document:
latex_engine: xelatex
highlight: tango
df_print: tibble
toc: true
toc_depth: 5
number_sections: True
extra_dependencies:
- amsfonts
- amsmath
- xcolor
- hyperref
fig_caption: True
bibliography: [latex/intro.bib]
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/research-institute-for-nature-and-forest.csl
citecolor: green
linkcolor: red
urlcolor: magenta
---
\newpage
R can be summarized in $\textbf{three}$ principles (John M. Chambers, 2016)
* $\texttt{Everything that exists in R is an \textbf{object}.}$
* $\texttt{Everything that happens in R is a \textbf{function} call.}$
* $\texttt{\textbf{Interfaces} to other languages are a part of R.}$
\newpage
# R Objects
* R provides a number of specialized data structures: **R objects**.
* The most common types of R objects^[For people interested in the details we recommend
to have a look at the file
$\href{https://cran.r-project.org/doc/manuals/r-release/R-ints.pdf}{R\,Internals\,}$ and the R source code (especially the header file *Rinternals.h*) where all the current object types are defined.] are:
* $\textcolor{blue}{\textbf{logical}}$, $\textcolor{blue}{\textbf{integer}}$,
$\textcolor{blue}{\textbf{double}}$, $\textcolor{blue}{\textbf{character}}$,
[$\textcolor{blue}{\textbf{complex}}$, $\textcolor{blue}{\textbf{raw}}$] (atomic vectors)
* $\textcolor{blue}{\textbf{list}}$ (heterogeneous/recursive vectors)
* $\textcolor{blue}{\textbf{closure}}$ (functions)
* $\textcolor{blue}{\textbf{environment}}$
* $\textcolor{blue}{\textbf{S4}}$
* $\textcolor{blue}{\textbf{symbol}}$ (variable name)
* $\textcolor{blue}{\textbf{NULL}}$
* An R object can be referred to by symbols/variables.
* The **type** of an object in R is determined by the $\textcolor{blue}{\textbf{typeof()}}$ function.
## The creation of R objects
* The following code creates an R object (vector of $4$ integers) which bears the name **x**, e.g.:
```{R, echo=TRUE, comment=''}
x <- c(3L,17L,12L,5L)
x
cat(sprintf(" typeof(x):%s\n", typeof(x)))
```
Under the hood it passes through the following steps:
- creation of an R object i.e. vector of $4$ integers in memory.
- binding/assigning the R object to the variable name $x$ using $\textcolor{blue}{\textbf{<-}}$ (left arrow symbol).
* There are $\textbf{less common}$ ways to bind variables to R objects:
* A simple equality sign ($\textcolor{blue}{\textbf{=}}$). This approach is mainly used to assign default function arguments.
* Using the $\textcolor{blue}{\textbf{assign()}}$ function.
### Examples
* $\textbf{preferred}$ way to assign variables
```{R, echo=TRUE, comment=''}
x <- 5.0
x
cat(sprintf("typeof(x):%s\n", typeof(x)))
```
* $\textbf{less common}$ way to assign variables
- Alternative $1$:
```{R, echo=TRUE, comment=''}
y = 5.0
y
cat(sprintf("typeof(y):%s\n", typeof(y)))
```
$\newline$
- Alternative $2$:
```{R, echo=TRUE, comment=''}
assign("z",5.0)
z
cat(sprintf("typeof(y):%s\n", typeof(y)))
```
* functions are objects (as stated previously)
```{R, echo=TRUE, comment=''}
myvar <- function(x, av=0){
n <- length(x)
if(n>1){
return(1.0/(n-1)*sum((x-av)^2))
}else{
stop("ERROR:: Dividing by zero (n==1) || (n==0) ")
}
}
cat(sprintf("typeof(myvar):%s\n", typeof(myvar)))
```
$\newline$
```{R, echo=TRUE, comment=''}
x <- rnorm(100)
myvar(x)
myvar(x,mean(x))
var(x)
```
## The deletion of R objects
You can remove objects from (the current environment) by invoking the $\textcolor{blue}{\textbf{rm()}}$ function.
The removal process consists of $2$ steps i.e.:
* the binding between the variable name and the R object is severed.
* the R object is automatically removed from memory by R's internal garbage collector ($\textcolor{blue}{\textbf{gc()}}$).
### Examples
* Remove the variable $x$ from the current environment
```{R, echo=TRUE, comment=''}
ls()
rm(x)
ls()
```
* Remove **all** variables from the current environment
```{R, echo=TRUE, comment=''}
ls()
rm(list=ls())
ls()
```
\newpage
$\texttt{"Nothing exists except atoms and empty space; everything else is opinion". (Democritos)}$
# Atomic Data Types
## The core/atomic data types
* R has the following 6 $\textcolor{blue}{\textbf{atomic}}$ data types:
* logical (i.e. boolean)
* integer
* double
* character (i.e. string)
* complex
* raw (i.e. byte)
The latter 2 types (i.e. complex and especially raw) are less common.
The $\textcolor{blue}{\textbf{typeof()}}$
function determines the **INTERNAL** storage/type of an R object.
```{r, echo=FALSE, results='hide'}
# Clean up env.
rm(list=ls())
ls()
```
### Examples
* boolean/logical values: either $\textcolor{blue}{\textbf{TRUE}}$ or $\textcolor{blue}{\textbf{FALSE}}$
```{R, echo=TRUE, comment=''}
x1 <- TRUE
x1
typeof(x1)
```
$\newline$
* integer values ($\in \, \mathbb{Z}$):
```{R, echo=TRUE, comment=''}
x2 <- 3L
x2
typeof(x2)
```
$\newline$
* double (precision) values:
```{R, echo=TRUE, comment=""}
x3 <- 3.14
x3
typeof(x3)
```
$\newline$
* character values/strings
```{R, echo=TRUE, comment=''}
x4 <- "Hello world"
x4
typeof(x4)
```
$\newline$
* complex values ($\in \, \mathbb{C}$):
```{R, echo=TRUE, comment=''}
x5 <- 2.0 + 3i
x5
typeof(x5)
```
\newpage
## Operations on atomic data types
* $\textbf{logical}$ operators: $\textcolor{blue}{\textbf{==}}$, $\textcolor{blue}{\textbf{!=}}$, $\textcolor{blue}{\textbf{\&\&}}$, $\textcolor{blue}{\textbf{||}}$, $\textcolor{blue}{\textbf{!}}$
* $\textbf{numerical}$ operators: $\textcolor{blue}{\textbf{+}}$, $\textcolor{blue}{\textbf{-}}$, $\textcolor{blue}{\textbf{*}}$, $\textcolor{blue}{\textbf{/}}$, $\textcolor{blue}{\textbf{\textasciicircum}}$, $\textcolor{blue}{\textbf{**}}$ (same as the caret), but also:
* integer division: $\textcolor{blue}{\textbf{\%/\%}}$
* modulo operation: $\textcolor{blue}{\textbf{\%\%}}$
* $\textcolor{orange}{\textbf{Note}}$: matrix multiplication will be performed using $\textcolor{blue}{\textbf{\%*\%}}$
* $\textbf{character/string}$ manipulation:
* $\textcolor{blue}{\textbf{nchar()}}$:
* $\textcolor{blue}{\textbf{paste()}}$:
* $\textcolor{blue}{\textbf{cat()}}$:
* $\textcolor{blue}{\textbf{sprintf()}}$:
* $\textcolor{blue}{\textbf{substr()}}$:
* $\textcolor{blue}{\textbf{strsplit()}}$:
* $\textcolor{orange}{\textbf{Note}}$: Specialized R libraries were developed to manipulate strings e.g. $\href{https://cran.r-project.org/web/packages/stringr/index.html}{stringr}$
* explicit $\textbf{cast}$/conversion: https://data-flair.training/blogs/r-string-manipulation/
* $\textcolor{blue}{\textbf{as.\{logical, integer, double, complex, character\}()}}$
* explicit $\textbf{test}$ of the type of a variable:
* $\textcolor{blue}{\textbf{is.\{logical, integer, double, complex, character\}()}}$
### Examples
* Logical operators:
```{R, echo=TRUE, comment=''}
x <-3
y <-7
(x<=3) &&(y==7)
!(y<7)
```
$\newline$
* Mathematical operations
```{R, echo=TRUE, comment=''}
2**4
7%%4
7/4
7%/%4
```
$\newline$
* String operations
```{R, echo=TRUE, comment=''}
s <- "Hello"
nchar(s)
news <- paste(s,"World")
news
sprintf("My new string:%20s\n", news)
city <- "Witwatersrand"
substr(city,4,8)
```
$\newline$
* Conversion and testing of types
```{R, echo=TRUE, comment=''}
s <- "Hello World"
is.character(s)
```
$\newline$
```{R, echo=TRUE, comment=''}
s1 <- "-500"
is.character(s1)
```
$\newline$
```{R, echo=TRUE, comment=''}
s2 <- as.double(s1)
is.character(s2)
is.double(s2)
```
$\newline$
```{R, echo=TRUE, comment=''}
s3 <- as.complex(s2)
s3
sqrt(s3)
```
## Exercises
* - Calculate $\log_2(10)$ using $\texttt{R}$'s $\textcolor{blue}{\textbf{log()}}$ function.\newline
The **default** version of $\textcolor{blue}{\textbf{log()}}$ is $\log_e():=\ln()$.\newline
Use $\textcolor{blue}{\textbf{help(log)}}$ to find the appropriate arguments for the $\textcolor{blue}{\textbf{log()}}$ function.
- Perform the inverse operation (i.e. to obtain $10$).
* - $\texttt{R}$'s $\textcolor{blue}{\textbf{round()}}$ function rounds (by default) a real number to $0$ decimal places. Round the number $\pi$ to 4 decimal places.
- $\textcolor{orange}{\textbf{Note:}}$
+ You can generate the value for $\pi$ as: $4.0*\textcolor{blue}{\textbf{atan}(1.0)}$.
+ Use $\textcolor{blue}{\textbf{help(round)}}$ to find the appropriate arguments for the $\textcolor{blue}{\textbf{round()}}$ function.
* Let $z\:= 3\,+\,4i$
- Use $\texttt{R}$'s $\textcolor{blue}{\textbf{Re()}}$, $\textcolor{blue}{\textbf{Im()}}$ functions to extract the real and imaginary parts of z.
- Calculate the modulus of $z$ using $\texttt{R}$'s $\textcolor{blue}{\textbf{Mod()}}$ function and check$\newline$ whether you the same answer
using $\sqrt{ \Re(z)^2 + \Im(z)^2 }$.
- Calculate the argument of $z$ using $\texttt{R}$'s $\textcolor{blue}{\textbf{Arg()}}$ function and check$\newline$
whether you have the same answer using $\arctan{\Big(\frac{\Im(z)}{\Re(z)}\Big)}$.
\newpage
# Atomic vectors
* An $\textbf{atomic}$ vector is a data structure containing
elements of $\textbf{only one atomic}$ data type.\newline
Therefore, an atomic vector is $\textbf{homogeneous}$.
* Atomic vectors are stored in a $\textbf{linear}$ fashion.
* R does $\textbf{NOT}$ have scalars:
* An atomic vector of $\textbf{length 1}$ plays the role of a scalar.
* Vectors of $\textbf{length 0}$ also exist (and they have some use!).
* A $\textbf{list}$ is a vector not necessarily of the atomic type.\newline A list is also
known as a $\textbf{recursive/generic}$ vector ($\textit{vide infra}$).
## Creation of atomic vectors
Atomic vectors can be created in a multiple ways:
* Use of the $\textcolor{blue}{\textbf{vector()}}$ function.
* Use of the $\textcolor{blue}{\textbf{c()}}$ function (**c** stands for *combine*).
* Use of the column operator $\textcolor{blue}{\textbf{:}}$
* Use of the $\textcolor{blue}{\textbf{seq()}}$ and $\textcolor{blue}{\textbf{rep()}}$ functions.
The length of a vector can be retrieved using the $\textcolor{blue}{\textbf{length()}}$ function.
```{R, echo=FALSE, results='hide'}
rm(list=ls())
ls()
```
### Examples
* use of the $\textcolor{blue}{\textbf{vector()}}$ function:
```{R, echo=TRUE, comment=''}
x <- vector() # Empty vector (Default:'logical')
x
length(x)
typeof(x)
```
$\newline$
```{R, echo=TRUE, comment=''}
x <- vector(mode="complex", length=4)
x
length(x)
x
x[1] <- 4
x
```
* use of the $\textcolor{blue}{\textbf{c()}}$ function:
```{R, echo=TRUE, comment=''}
x1 <- c(3, 2, 5.2, 7)
x1
x2 <- c(8, 12, 13)
x2
x3 <- c(x2, x1)
x3
x4 <- c(FALSE,TRUE,FALSE)
x4
x5 <- c("Hello", "Salt", "Lake", "City")
x5
```
$\newline$
* use of the column operator:
```{R, echo=TRUE, comment=''}
y1 <- 1:10
y1
y2 <- 5:-5
y2
y3 <- 2.3:10
y3
y4 <- 2.0*(7:1)
y4
y5 <- (1:7) - 1
y5
```
* $\textcolor{blue}{\textbf{seq()}}$ and $\textcolor{blue}{\textbf{rep()}}$ functions
```{R, echo=TRUE, comment=''}
z1 <- seq(from=1, to=15, by=3)
z1
z2 <- seq(from=-2,to=5,length=4)
z2
```
$\newline$
```{R, echo=TRUE, comment=''}
z3 <- rep(c(3,2,4), time=2)
z3
z4 <- rep(c(3,2,4), each=3)
z4
z5 <- rep(c(1,7), each=2, time=3)
z5
length(z5)
```
### Exercises
* Use the $\textcolor{blue}{\textbf{seq()}}$ function to generate the following sequence: $\newline$
6 13 20 27 34 41 48
* Create the following R vector using $\textbf{only}$ the $\textcolor{blue}{\textbf{seq()}}$ and $\textcolor{blue}{\textbf{rep()}}$ functions:$\newline$
-8 -8 -8 -8 0 8 8 8 16 16 16 16 16
## Operations on vectors: element-wise
* All operations on vectors in R happen $\textbf{element by element}$ (cfr. $\textit{NumPy}$).
* $\textcolor{blue}{\textbf{Vector Recycling}}$:
If 2 vectors of \textbf{different} lengths are involved in an operation, the \textbf{shortest vector}
will be repeated until all elements of the longest vector are matched. \newline
A \textit{warning} message will be sent to the stdout.
### Examples
```{R, echo=TRUE, comment=''}
x <- -3:3
x
y <- 1:7
y
xy <- x*y
xy
xpy <- x^y
xpy
```
$\newline$
```{R, echo=TRUE, comment=''}
x <- 0:10
y <- 1:2
length(x)
length(y)
x
y
x+y
```
### Exercises
* Create the following vector (do $\textbf{not}$ use $\textcolor{blue}{\textbf{c()}}$!): $\newline$
-512 -216 -64 -8 0 8 64 216 512 1000
## Retrieving elements of vectors
* Indexing: starts at $\textbf{1}$ ($\textbf{not 0}$ like C/C++, Python, Java, ....)
see also: $\newline$
\href{https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF}{Edsger Dijkstra: Why numbering should start at zero}
* Use of vector with indices to extract values.
* Advanced features:
* use of boolean values to extract values.
* the membership operator: $\textcolor{blue}{\textbf{\%in\%}}$.
* the deselect/omit operator: $\textcolor{blue}{\textbf{-}}$
* $\textcolor{blue}{\textbf{which()}}$: returns the indices for which the condition is true.
* $\textcolor{blue}{\textbf{any()}}$/$\textcolor{blue}{\textbf{all()}}$ functions.
* $\textcolor{blue}{\textbf{any()}}$ : $\textcolor{blue}{\textbf{TRUE}}$ if at least $1$ value is true
* $\textcolor{blue}{\textbf{all()}}$ : $\textcolor{blue}{\textbf{TRUE}}$ if all values are true
### Examples
* Use of a simple index:
```{R, echo=TRUE, comment=''}
x <- seq(2,100,by=15)
x
x[4]
x[1]
```
$\newline$
* Select several indices at once using vectors:
```{R, echo=TRUE, comment=''}
x
x[3:5]
x[c(1,3,5,7)]
x[seq(1,7,by=2)]
```
$\newline$
* Extraction via booleans (i.e. retain only those values that are equal to $\textcolor{blue}{\textbf{TRUE}}$):
```{R, echo=TRUE, comment=''}
x
x>45
x[x>45]
```
$\newline$
* Use of the $\textcolor{blue}{\textbf{\%in\%}}$ operator:
```{R, echo=TRUE, comment=''}
x
10 %in% x
62 %in% x
c(32,33,43) %in% x
!(c(32,33,43) %in% x)
```
$\newline$
* Negate/filter out the elements with $\textbf{negative}$ indices:
```{R, echo=TRUE, comment=''}
x <- c(1,13,17,27,49,91)
x
x[-c(2,4,6)]
z <- x[-1] - x[-length(x)]
z
```
$\newline$
* The $\textcolor{blue}{\textbf{which()}}$ function returns
$\textbf{only those indices}$ of which the condition/expression is \textbf{true}.
```{R, echo=TRUE, comment=''}
# Sample 10 numbers from N(0,1)
vecnum <- rnorm(n=10)
vecnum
which(vecnum>1.0)
```
$\newline$
* Use of the $\textcolor{blue}{\textbf{any()}}$/$\textcolor{blue}{\textbf{all()}}$ functions.
```{R, echo=TRUE, comment=''}
y <- seq(0,100,by=10)
x
y
any(x<y)
all(x[6:7]>y[2:3])
```
### Exercises
* R has the its own inversion function, $\textcolor{blue}{\textbf{rev()}}$, e.g.:$\newline$
```{R, echo=TRUE,comment=''}
x <- seq(from=2,to=33,by=3)
x
y <- rev(x)
y
```
Invert the vector x without invoking the $\textcolor{blue}{\textbf{rev()}}$ function.
* The $\texttt{Taylor series}$ for $\ln(1+x)$ is converging when $|x| <1$ and is given by:
\begin{equation}
\ln(1+x) = x-\frac{x^2}{2}+\frac{x^3}{3}-\frac{x^4}{4}+\frac{x^5}{5}-\frac{x^6}{6} +\ldots \nonumber
\end{equation}
Calculate the sum of the first 5, 10, 15 terms in the above expression to
approximate $\ln(1.2)$.
Compare with $\texttt{R}$'s value i.e.: $\textcolor{blue}{\textbf{log(1.2)}}$.
* The $\texttt{logarithmic}$ return in finance is defined as:
\begin{equation}
R_t = \ln \Big ( \frac{P_t}{P_{t-1}} \Big ) \nonumber
\end{equation}
- Generate a financial time series using the following $\texttt{R}$ code:
```{R, echo=TRUE,comment='', results=FALSE }
price <- abs(rcauchy(1000))+1.E-6
```
- Calculate the $\texttt{logarithmic return}$ for the financial time series $\texttt{price}$.$\newline$
The newly created time series will be $1$ element shorter in length than the original one.$\newline$
Compare your result with $\textcolor{blue}{\textbf{diff(log(price))}}$.
* $\texttt{Monte-Carlo}$ approximation of $\pi$
Let $\texttt{S1}$ be the square spanned by the following $4$ vertices: $\{(0,0),(0,1),(1,0),(1,1)\}$.$\newline$
Let $\texttt{S2}$ be the first quadrant of the unit-circle $\mathcal{C}:\,x^2+y^2 =1$.$\newline$
The ratio $\rho$ defined as:
\begin{equation}
\rho:=\frac{\mathrm{Area\;S2}}{\mathrm{Area\;S1}}= \frac{\mathrm{\#Points\;in\;S2}}{\mathrm{\#Points\;in\;S1}} \nonumber
\end{equation}
allows us to estimate $\frac{\pi}{4}$ numerically.
Therefore:
- Sample $100000$ independent $x$-coordinates from $\texttt{Unif}$.
- Sample $100000$ independent $y$-coordinates from $\texttt{Unif}$.
- Calculate an approximate value for $\pi$ using the Monte-Carlo approach.
Note: The uniform distribution $[0,1)$ ($\texttt{Unif}$) can be sampled using $\textcolor{blue}{\textbf{runif()}}$.
## Hash tables
A $\textbf{hash table}$ is a data structure which implements an associative array or dictionary.
It is an abstract data which maps data to keys.
* There are several ways to create one:
* Map names to an existing vector
* Add names when creating the vector
* To remove the map, map the names to NULL
### Examples
* Creation of 2 independent vectors
```{R, echo=TRUE, comment=''}
capitals <- c("Albany", "Providence", "Hartford", "Boston", "Montpelier",
"Concord", "Augusta")
states <- c("NY", "RI", "CT", "MA", "VT", "NH", "ME")
capitals
states
capitals[3]
```
$\newline$
* Create the hashtable/dictionary
```{R, echo=TRUE, comment=''}
# Method 1
names(capitals) <- states
capitals
capitals["MA"]
names(capitals)
```
$\newline$
```{R, echo=TRUE, comment=''}
# Method 2
phonecode <- c("801"="SLC", "206"="Seattle", "307"="Wyoming")
phonecode
phonecode["801"]
```
$\newline$
* Dissociate the 2 vectors
```{R, echo=TRUE, comment=''}
names(capitals) <- NULL
capitals
```
## NA (Not Available values)
* $\textcolor{blue}{\textbf{NA}}$: stands for 'Not Available'/Missing values and has a length of $1$.$\newline$
There are in essence $4$ versions depending on the type:
- $\textcolor{blue}{\textbf{NA}}$ (logical - $\textbf{default}$)
- $\textcolor{blue}{\textbf{NA\_integer}}$ (integer)
- $\textcolor{blue}{\textbf{NA\_real}}$ (double precision)
- $\textcolor{blue}{\textbf{NA\_character}}$ (string)
Under the hood, the version of NA is subjectd to $\textbf{coercion}$:$\newline$
$\textit{logical}$ $\rightarrow$ $\textit{integer}$ $\rightarrow$ $\textit{double}$ $\rightarrow$ $\textit{character}$
* some functions e.g. $\textcolor{blue}{\textbf{mean()}}$ return (by default) NA if$\newline$
1 or more instances NA are present in a vector.
* $\textcolor{blue}{\textbf{is.na()}}$: test a vector (element-wise) for NA values.$\newline$
$\textcolor{red}{\textbf{Do NOT use:}}$
```{R, echo=TRUE, results='hide', comment=''}
x == NA
```
$\textcolor{orange}{\textbf{but use INSTEAD:}}$
```{R, echo=TRUE, results='hide', comment=''}
is.na(x)
```
### Examples
* Types of NA
```{R, echo=TRUE, comment=''}
x <- NA
typeof(x)
```
$\newline$
```{R, echo=TRUE, comment=''}
# logical NA coerced to double precision NA
x <- c(3.0, 5.0, NA)
typeof(x[3])
```
$\newline$
* Functions on a vector containing NA
```{R, echo=TRUE, comment=''}
mean(x)
mean(x, na.rm=TRUE)
```
$\newline$
* Check of the NA availability
```{R, echo=TRUE, comment=''}
x <- c(NA, 1, 2, NA)
is.na(x)
```
$\newline$
* Functions on a vector containing NA
```{R, echo=TRUE, comment=''}
mean(x)
mean(x, na.rm=TRUE)
```
### Exercises
* A family has installed a device to monitor their daily energy consumption (in kWh).$\newline$
When a measurement fails or is unavailable NA is recorded.
You can invoke the following code to generate the measurements generated by the device.
```{R, echo=TRUE, comment=' ', results=FALSE}
dailyusage <- 30.0 + runif(365, min=0, max=5.0)
dailyusage[sample(1:365, sample(1:50,1), replace=FALSE)] <- NA
```
- How many measurements failed?
- What is the average daily energy consumption (based on the non-failed) measurements?
## NaN and infinities
* $\textcolor{blue}{\textbf{NaN}}$ (only for numeric types!), and the infinties $\textcolor{blue}{\textbf{Inf}}$ and $\textcolor{blue}{\textbf{-Inf}}$ $\newline$
are part of the [IEEE 754 floating-point standard](https://ieeexplore.ieee.org/document/8766229).
* To test whether you have:
- finite numbers: use $\textcolor{blue}{\textbf{is.finite()}}$
- infinite numbers: use $\textcolor{blue}{\textbf{is.infinite()}}$
- NaNs: use $\textcolor{blue}{\textbf{is.nan()}}$
* Further:
- a $\textcolor{blue}{\textbf{NaN}}$ will return $\textcolor{blue}{\textbf{TRUE}}$ only when tested by $\textcolor{blue}{\textbf{is.nan()}}$
- a $\textcolor{blue}{\textbf{NA}}$ will return $\textcolor{blue}{\textbf{TRUE}}$ when tested by either $\textcolor{blue}{\textbf{is.nan()}}$ or $\textcolor{blue}{\textbf{is.na()}}$
### Examples
* Infinities:
```{R, echo=TRUE, comment=''}
x <- 5.0/0.0
x
is.finite(x)
is.infinite(x)
is.nan(x)
```
$\newline$
```{R, echo=TRUE, comment=''}
y <- -5.0/0.0
y
is.finite(y)
is.infinite(y)
is.nan(y)
```
$\newline$
```{R, echo=TRUE, comment=''}
z <- x + y
z
typeof(z)
is.finite(z)
is.infinite(z)
is.nan(z)
```
$\newline$
* $\textcolor{blue}{\textbf{is.na()}}$ vs. $\textcolor{blue}{\textbf{is.nan()}}$:
```{R, echo=TRUE, comment=''}
# is.nan
v <- c(NA, z, 5.0, log(-1.0))
is.nan(v)
```
$\newline$
```{R, echo=TRUE, comment=''}
# is.na(): also includes NaN!
v <- c(NA, z, 5.0, log(-1.0))
is.na(v)
```
## Note on logical operators
* $\textcolor{blue}{\textbf{\&}}$, $\textcolor{blue}{\textbf{|}}$, $\textcolor{blue}{\textbf{!}}$, $\textcolor{blue}{\textbf{xor()}}$: $\textbf{element-wise}$ operators on vectors (cfr. arithmetic operators)
* $\textcolor{blue}{\textbf{\&\&}}$, $\textcolor{blue}{\textbf{||}}$: evaluated from $\textbf{left}$ to $\textbf{right}$ until result is determined.
### Examples
* Vector operators ($\textcolor{blue}{\textbf{\&}}$, $\textcolor{blue}{\textbf{|}}$, $\textcolor{blue}{\textbf{!}}$ and $\textcolor{blue}{\textbf{xor()}}$)
```{R, echo=TRUE, comment=''}
x <- sample(x=1:10, size=10, replace=TRUE)
x
y <- sample(x=1:10, size=10, replace=TRUE)
y
```
$\newline$
```{R, echo=TRUE, comment=''}
v1 <- (x<=3)
v1
```
$\newline$
```{R, echo=TRUE, comment=''}
v2 <- (y>=7)
v2
```
$\newline$
```{R, echo=TRUE, comment=''}
v1 & v2
```
$\newline$
```{R, echo=TRUE, comment=''}
v1 | v2
```
$\newline$
```{R, echo=TRUE, comment=''}
xor(v1, v2)
```
$\newline$
```{R, echo=TRUE, comment=''}
!v1
```
$\newline$
### Exercises
* Generate a random vector of integers using the following code:
```{R, echo=TRUE,comment='', results=FALSE }
x <- sample(x=0:1000,size=100, replace=TRUE)
```
- Invoke the above code to generate the vector $\texttt{x}$
- Find if there are any integers in the vector $\texttt{x}$ which can be divided by 4 and 6
- Find those numbers and their corresponding indices in the vector $\texttt{x}$.