forked from How-to-Learn-to-Code/Rclass-DataScience
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathindex.tex
6070 lines (4904 loc) · 258 KB
/
index.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
\PassOptionsToPackage{dvipsnames,svgnames,x11names}{xcolor}
%
\documentclass[
letterpaper,
DIV=11,
numbers=noendperiod]{scrreprt}
\usepackage{amsmath,amssymb}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
\usepackage{lmodern}
\ifPDFTeX\else
% xetex/luatex font selection
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\ifLuaTeX
\usepackage{luacolor}
\usepackage[soul]{lua-ul}
\else
\usepackage{soul}
\fi
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{5}
% Make \paragraph and \subparagraph free-standing
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{241,243,245}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.40,0.45,0.13}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\BuiltInTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\ExtensionTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.28,0.35,0.67}{#1}}
\newcommand{\ImportTok}[1]{\textcolor[rgb]{0.00,0.46,0.62}{#1}}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\NormalTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\RegionMarkerTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.07,0.07,0.07}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\KOMAoption{captions}{tableheading}
\makeatletter
\@ifpackageloaded{tcolorbox}{}{\usepackage[skins,breakable]{tcolorbox}}
\@ifpackageloaded{fontawesome5}{}{\usepackage{fontawesome5}}
\definecolor{quarto-callout-color}{HTML}{909090}
\definecolor{quarto-callout-note-color}{HTML}{0758E5}
\definecolor{quarto-callout-important-color}{HTML}{CC1914}
\definecolor{quarto-callout-warning-color}{HTML}{EB9113}
\definecolor{quarto-callout-tip-color}{HTML}{00A047}
\definecolor{quarto-callout-caution-color}{HTML}{FC5300}
\definecolor{quarto-callout-color-frame}{HTML}{acacac}
\definecolor{quarto-callout-note-color-frame}{HTML}{4582ec}
\definecolor{quarto-callout-important-color-frame}{HTML}{d9534f}
\definecolor{quarto-callout-warning-color-frame}{HTML}{f0ad4e}
\definecolor{quarto-callout-tip-color-frame}{HTML}{02b875}
\definecolor{quarto-callout-caution-color-frame}{HTML}{fd7e14}
\makeatother
\makeatletter
\@ifpackageloaded{bookmark}{}{\usepackage{bookmark}}
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\AtBeginDocument{%
\ifdefined\contentsname
\renewcommand*\contentsname{Table of contents}
\else
\newcommand\contentsname{Table of contents}
\fi
\ifdefined\listfigurename
\renewcommand*\listfigurename{List of Figures}
\else
\newcommand\listfigurename{List of Figures}
\fi
\ifdefined\listtablename
\renewcommand*\listtablename{List of Tables}
\else
\newcommand\listtablename{List of Tables}
\fi
\ifdefined\figurename
\renewcommand*\figurename{Figure}
\else
\newcommand\figurename{Figure}
\fi
\ifdefined\tablename
\renewcommand*\tablename{Table}
\else
\newcommand\tablename{Table}
\fi
}
\@ifpackageloaded{float}{}{\usepackage{float}}
\floatstyle{ruled}
\@ifundefined{c@chapter}{\newfloat{codelisting}{h}{lop}}{\newfloat{codelisting}{h}{lop}[chapter]}
\floatname{codelisting}{Listing}
\newcommand*\listoflistings{\listof{codelisting}{List of Listings}}
\makeatother
\makeatletter
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\@ifpackageloaded{subcaption}{}{\usepackage{subcaption}}
\makeatother
\newcounter{quartocallouttipno}
\newcommand{\quartocallouttip}[1]{\refstepcounter{quartocallouttipno}\label{#1}}
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\usepackage{bookmark}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\urlstyle{same} % disable monospaced font for URLs
\hypersetup{
pdftitle={How To Learn to Code for Data Science in R},
pdfauthor={Austin Daigle; John Patrick Flores; Madeline Gillman; Brian Gural; Lorrie He; Justin Landis; Teresa McGee; Sarah Mae Parker; Matthew Sutcliffe},
colorlinks=true,
linkcolor={blue},
filecolor={Maroon},
citecolor={Blue},
urlcolor={Blue},
pdfcreator={LaTeX via pandoc}}
\title{How To Learn to Code for Data Science in R}
\author{Austin Daigle \and John Patrick Flores \and Madeline
Gillman \and Brian Gural \and Lorrie He \and Justin Landis \and Teresa
McGee \and Sarah Mae Parker \and Matthew Sutcliffe}
\date{2024-05-22}
\begin{document}
\maketitle
\renewcommand*\contentsname{Table of contents}
{
\hypersetup{linkcolor=}
\setcounter{tocdepth}{2}
\tableofcontents
}
\bookmarksetup{startatroot}
\chapter*{Preface}\label{preface}
\addcontentsline{toc}{chapter}{Preface}
\markboth{Preface}{Preface}
Welcome to How to Learn to Code!!
We are an organization that hopes to make learning to program
approachable, accessible, and effective. We want to improve rigor and
reproducibility in science by providing programming resources and
experiences to scientists and professionals in all levels of their
careers. Our classes are small-group based courses with a
teacher:student ratio that allows the students to learn dynamically and
independently. During classes, students are able to follow along with
the teacher leading the instruction, or work with one of our floating
teachers to troubleshoot or to better understand their own code.
This is our curriculum for learning R programming in the context of data
analysis. Our curriculum development team has worked tirelessly to
develop this new curriculum for the Summer of 2024. We are constantly
improving and updating our curricula, so if you're interested in
contributing or have suggestions, please visit
\url{https://howtolearntocode.web.unc.edu/} for our most up-to-date
contact information. If you have gotten to our Class 7 over Github, or
are proficient in Github yourself, feel free to submit an issue or pull
request at
\url{https://github.com/How-to-Learn-to-Code/Rclass-DataScience}.
\begin{longtable}[]{@{}
>{\centering\arraybackslash}p{(\columnwidth - 4\tabcolsep) * \real{0.3333}}
>{\centering\arraybackslash}p{(\columnwidth - 4\tabcolsep) * \real{0.3333}}
>{\centering\arraybackslash}p{(\columnwidth - 4\tabcolsep) * \real{0.3333}}@{}}
\caption{Table of Contents}\tabularnewline
\toprule\noalign{}
\begin{minipage}[b]{\linewidth}\centering
Class Day
\end{minipage} & \begin{minipage}[b]{\linewidth}\centering
Topic
\end{minipage} & \begin{minipage}[b]{\linewidth}\centering
Link
\end{minipage} \\
\midrule\noalign{}
\endfirsthead
\toprule\noalign{}
\begin{minipage}[b]{\linewidth}\centering
Class Day
\end{minipage} & \begin{minipage}[b]{\linewidth}\centering
Topic
\end{minipage} & \begin{minipage}[b]{\linewidth}\centering
Link
\end{minipage} \\
\midrule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
0 & Welcome to How to Learn to Code! &
\href{scripts/00_intro/class0.qmd}{Introduction} \\
1 & R Coding Basics & \href{scripts/01_codingBasics/class1.qmd}{Coding
Basics 1} \\
2 & Applying Coding Basics &
\href{scripts/01_codingBasics/class2.qmd}{Coding Basics 2} \\
3 & Let's Get Plotting! & \href{scripts/02_dataViz/class3.qmd}{Data
Visualization 1} \\
4 & Applying Visualization Methods &
\href{scripts/02_dataViz/class4.qmd}{Data Vizualization 2} \\
5 & Data Wrangling Basics &
\href{scripts/03_dataWrangling/class5.qmd}{Data Wrangling 1} \\
6 & Data Wrangling with Real Experimental Data &
\href{scripts/03_dataWrangling/class6.qmd}{Data Wrangling 2} \\
7 & Running a Reproducible Analysis &
\href{scripts/04_projects/class7.qmd}{Project 1} \\
8 & Practicing on Real World Data &
\href{scripts/04_projects/class8.qmd}{Project 2} \\
\end{longtable}
\bookmarksetup{startatroot}
\chapter{Welcome to How to Learn to
Code!}\label{welcome-to-how-to-learn-to-code}
\section{Introduction}\label{introduction}
This page will walk you through setting up access to UNC's computing
cluster and introduce you a bit to R and R Studio so we can hit the
ground running in the first class. To ensure you have access to the UNC
cluster (and thus able to participate in class), \textbf{please review
this document in full at least 24 hours in advance of the first
class}--Research IT will need time to approve your account request.
\section{Class 0 Objectives}\label{class-0-objectives}
\begin{itemize}
\item
Request a Longleaf account
\item
Launch an R Studio session on OnDemand
\item
Know what each of the four panels in R Studio show
\end{itemize}
\section{R vs.~R Studio}\label{r-vs.-r-studio}
In this class, you'll hear these two terms a lot. They sound similar,
but they are actually very different! \textbf{R} is the programming
language we will be learning in this class. \textbf{R Studio} is a
user-friendly interface (or \textbf{IDE,} integrated development
environment) we will be using to write scripts in R and interact with R
software.
\section{Longleaf}\label{longleaf}
``Longleaf'' is the name for UNC's computing system. Researchers in all
departments across UNC use it to run analyses, store data, and use
programs that require GPUs. Whenever someone says they are ``on
Longleaf'' or ``running code on Longleaf'' it means their personal
computer is connected to the cluster and they are either actively
interacting with a program running on the cluster (we will be doing this
with R Studio!) or writing code that tells the cluster to perform
certain tasks whenever it has the memory availability.
\textbf{Before the first class, you will need to request access to
Longleaf.} Follow the instructions on the
\href{https://help.rc.unc.edu/request-a-cluster-account}{Research IT
website}. In addition to your onyen and email address, you'll need the
following information:
\begin{itemize}
\item
Preferred shell: bash
\item
Faculty sponsor name and onyen: You can put your PI here, or if you do
not have a PI, leave blank.
\item
Type of subscription: Longleaf
\item
Description of work you will do on the cluster: How to Learn to Code R
class
\end{itemize}
It may take \textasciitilde24 hours before your account is approved.
\section{OnDemand}\label{ondemand}
OnDemand is a web portal that allows you to access \textbf{Longleaf}. We
will be using \textbf{OnDemand} to launch \textbf{R Studio} and run
\textbf{R} code. You will need to have your Longleaf account approved
before accessing OnDemand.
To launch OnDemand, navigate to this site in a browser of your choice:
\href{https://ondemand.rc.unc.edu/}{https://ondemand.rc.unc.edu} (you
may want to bookmark this site, you'll be accessing it for each class).
Once you've logged in, you'll see a page like this. Click on the
\textbf{RStudio Server} tile.
\begin{center}
\includegraphics{scripts/00_intro/class0_images/Picture1.png}
\end{center}
This will take you to a page where you can fill out some parameters for
your R Studio Server session. The only one you'll need to adjust is
``Number of hours'' where you should put ``2''.
\begin{center}
\includegraphics{scripts/00_intro/class0_images/Picture2.png}
\end{center}
\begin{tcolorbox}[enhanced jigsaw, bottomtitle=1mm, bottomrule=.15mm, toprule=.15mm, opacityback=0, leftrule=.75mm, breakable, colback=white, toptitle=1mm, left=2mm, coltitle=black, titlerule=0mm, opacitybacktitle=0.6, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, rightrule=.15mm, arc=.35mm, colframe=quarto-callout-note-color-frame, colbacktitle=quarto-callout-note-color!10!white]
You can request up to 10 hours, but it's good practice to only request
the amount of time you'll need (Longleaf is a shared resource!). Since
each class is 90 minutes, you'll likely only need to request 2 hours for
each class. Under ``Additional job submission arguments'' you adjust the
amount of memory requested. This won't be needed for How to Learn to
Code classes, but may be needed when you are running your own analyses
on large datasets in the future.
\end{tcolorbox}
After you've filled out the appropriate information, click Launch. This
will take you to the ``My Interactive Sessions'' page. Your session
request may be queued for a minute while space on the cluster is being
allocated for your session. Once it's ready, click ``Connect to R Studio
Server''. This will launch R Studio in a new tab.
\begin{center}
\includegraphics{scripts/00_intro/class0_images/Picture3.png}
\end{center}
\section{Navigating R Studio}\label{navigating-r-studio}
Your R Studio window is divided into four panes. You can adjust the
sizes of each pane (horizontally and vertically) by dragging the outer
edges.
\begin{tcolorbox}[enhanced jigsaw, bottomtitle=1mm, bottomrule=.15mm, toprule=.15mm, opacityback=0, leftrule=.75mm, breakable, colback=white, toptitle=1mm, left=2mm, coltitle=black, titlerule=0mm, opacitybacktitle=0.6, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, rightrule=.15mm, arc=.35mm, colframe=quarto-callout-note-color-frame, colbacktitle=quarto-callout-note-color!10!white]
You may only see three panes when you first launch R Studio. If that's
the case, go to File \textgreater{} New File \textgreater{} R Script.
\end{tcolorbox}
\begin{figure}[H]
{\centering \includegraphics{index_files/mediabag/rstudio-panes-labele.jpg}
}
\caption{image:
https://docs.posit.co/ide/user/ide/guide/ui/ui-panes.html}
\end{figure}%
The top left pane is called the \textbf{Source} and this is where you
will be writing and editing code. Writing code here does not
automatically \textbf{execute} or \textbf{run} it. To do that, you will
need to use the \textbf{Console} pane in the bottom left. There are a
few ways to get code written in the Source pane to the Console pane, in
order from least efficient to most efficient:
\begin{itemize}
\item
Copying the line of code you want to run and pasting it into the
console and then hitting the ``return'' or ``enter'' key.
\item
Putting your cursor anywhere in the line of code you want to run and
clicking ``Run'' in the upper right section of the Source pane
\item
Highlighting the line of code (or section of code) you want to run and
clicking ``Run'' in the upper right section of the Source pane
\item
Putting your cursor anywhere in the line of code you want to run,
highlighting the line of code, or highlighting the section of code you
want to run and pressing Alt + Enter (for PC) or cmd + return (for
Mac)
\end{itemize}
Any code written in the console is \textbf{not} saved anywhere.
Generally, people write their code in the Source pane, and then run it
as needed in the Console. This is important to remember when writing
reproducible code--all code needed to run your analyses, generate plots,
etc. should be written in the source (which is then saved as an R
script). Throughout this course, you will likely want to ensure that the
code you write during each class is saved in a separate R script.
The \textbf{Environments} pane shows current saved \textbf{objects}, but
also has tabs to show history (all commands executed in your current
session) and connections (if you connect to any local or remote
databases). You will almost exclusively be using the Environment tab.
The \textbf{Output} pane is in the bottom right and shows outputs of
code such as plots. It also has tabs for files (an interactive file
explorer), packages (which shows currently installed R packages), and
help (which shows package documentation). You will likely be using the
Plots and Help tabs the most.
\section{Running code}\label{running-code}
Try running the below line of code using one of the four ways described
above. First, copy the below line of code and paste it into the Source
pane.
\texttt{print("hello\ world!")}
Before executing the code, your Source and Console panes will look like
this:
\begin{center}
\includegraphics{scripts/00_intro/class0_images/Picture4.png}
\end{center}
After executing the line of code, your source pane will look like this:
\begin{center}
\includegraphics{scripts/00_intro/class0_images/Picture5.png}
\end{center}
Congrats! You just ran your first line of code. If you want to save your
script (what's written in the source pane) go to File \textgreater{}
Save as and save your script with a helpful name in a location that
makes sense (e.g., maybe in a folder called ``H2L2C\_class'' and name
the script ``hello\_world.R'').
Review the rest of the information on this page before Class 1, but
don't worry if it doesn't make sense right away. We will be going over
some of it in the first class and touching on it throughout the course.
\section{Talking like an R user}\label{talking-like-an-r-user}
Below is some jargon that you may hear during class. Don't worry about
memorizing it all before Class 1! Just know that it's here so if you are
ever wondering what a term means you will know where to look.
\textbf{Running code/run this line/execute:} Telling R to perform the
command given in a console. If someone says ``run this line of code''
that means to send it to the console (either by copying/pasting or using
one of the shortcuts mentioned previously).
\textbf{Data types:} Data types in R include numeric, logical, and
character. There are a few more, but those are the main three. We will
touch on these more in Classes 1 and 2.
\textbf{Vector:} A series of values of any data type. A vector is
created using the \texttt{c()} function (c for combine/concatenate).
\textbf{Factors:} This is the R term for categorical data. Sometimes R
will automatically treat data as categorical (especially if it is a
character type), but not always. You can coerce other data types (like
numeric) to factors using using the \texttt{factor()} function and
specifying the order using the \texttt{levels} argument. For more
information on factors, see
\href{https://r4ds.hadley.nz/factors.html}{this page} in the R for Data
Science book.
\textbf{Data frame:} The best way to think of data frames is a
spreadsheet. Technically, they are composed of vectors. Typically the
rows in a data frame will correspond to observations and the columns
will correspond to variables describing those observations. Data in a
data frame can be of different types--i.e.~you can have one column be
character (maybe describing hair color for each observation) and another
be numeric (maybe describing height for each observation).
\textbf{Matrix:} A matrix in R is very similar to a data frame. Unlike a
data frame, all elements must be of the same data type.
\textbf{Functions:} A function performs a given task. This task can be
very simple (add two numbers) or more complex (create a large data
frame, run a linear regression, save the output to a csv file). R has
many built-in functions you will use. Many packages also have functions
you can use.
\textbf{Packages:} Packages in R are extensions of what is called ``base
R.'' Base R refers to using R without any add-ons (i.e., no packages).
Packages can have data, functions, and/or compiled code. It is the
responsibility of package developers to maintain their package--which
means some undergo frequent updates and some haven't been touched in
years (and thus might not work anymore for whatever reason). It also
means that some packages can have bugs or might not be appropriate for
your data/analysis. To use a package, you will first need to install it
using the function
\texttt{install.package("\textless{}package\_name\textgreater{}")}. You
will only need to install the package once. Each time you want to use
the package, you'll need to load it into your environment:
\texttt{library(\textless{}package\_name\textquotesingle{})}. Once it is
loaded into your environment, you will be able to use any functions or
data in the package.
\textbf{global vs.~local:} Global refers to something (usually a
variable) that is accessible to the entire program/code. Local refers to
something (usually a variable) that is accessible only relative to
something else (such as within a specific code block, like a function).
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# Global variable}
\NormalTok{x }\OtherTok{\textless{}{-}} \StringTok{"airplane"}
\CommentTok{\# Function that defines a local variable}
\NormalTok{my\_function }\OtherTok{\textless{}{-}} \ControlFlowTok{function}\NormalTok{() \{}
\NormalTok{ y }\OtherTok{\textless{}{-}} \StringTok{"car"}
\NormalTok{\}}
\CommentTok{\# Accessing the local variable outside the function returns an error}
\NormalTok{y}
\CommentTok{\# But the global variable is accessible}
\NormalTok{x}
\CommentTok{\# Accessing the local variable }
\NormalTok{z }\OtherTok{\textless{}{-}} \FunctionTok{my\_function}\NormalTok{()}
\NormalTok{z}
\end{Highlighting}
\end{Shaded}
\textbf{directory:} A directory is another term for what you may refer
to as a ``folder'' on your computer.
\textbf{paths:} Paths are the directions to files and folders on your
system. Understanding paths is important for reading your data into your
R environment, since you will need to tell R where the file is located.
You can have global and local paths. Global paths are sort of like the
full set of instructions starting from your home base. Local paths are
instructions given a certain starting location. Here's an example of a
global path to an example file on Longleaf:
\texttt{/work/users/g/h/goheels/my\_project/my\_data.csv}. Here's an
example of a local path, given the starting spot of the \texttt{goheels}
directory: \texttt{my\_project/my\_data.csv}.
\textbf{syntax/style:} The visual appearance (spaces, indentations,
capitalization) of your code greatly improves readability and makes it
easier for someone else to quickly understand what it's doing (or you
six months later). In this class, we will follow the Tidyverse Style
Guide and encourage you to reference it during class to ensure you are
consistently naming variables and using appropriate syntax. The sections
most relevant for now are
\href{https://style.tidyverse.org/files.html}{Files},
\href{https://style.tidyverse.org/files.html}{Syntax}, and the
``Comments'' section of the
\href{https://style.tidyverse.org/functions.html\#comments-1}{Functions}
page. Later classes will touch on
\href{https://style.tidyverse.org/pipes.html}{pipes} and
\href{https://style.tidyverse.org/ggplot2.html}{ggplot2}.
\textbf{Conditionals:} A conditional is a line of code that will run
only if a particular condition is met. You can recognize these by the
use of ``if'' ``else'' or ``while''. The best way to understand these is
by actually reading the code out loud. If you were to read the below
example out loud, you might say ``if x equals 3, print `condition is
met', else print `condition is not met'\,''. What do you think will
happen if \texttt{x\ ==\ 3}? What if \texttt{x\ ==\ 4}?
\begin{Shaded}
\begin{Highlighting}[]
\ControlFlowTok{if}\NormalTok{(x }\SpecialCharTok{==} \DecValTok{3}\NormalTok{) \{}
\FunctionTok{print}\NormalTok{(}\StringTok{"condition is met"}\NormalTok{)}
\NormalTok{\} }\ControlFlowTok{else}\NormalTok{ \{}
\FunctionTok{print}\NormalTok{(}\StringTok{"condition is not met"}\NormalTok{)}
\NormalTok{\}}
\end{Highlighting}
\end{Shaded}
\section{Understanding errors and
warnings}\label{understanding-errors-and-warnings}
You will get lots of errors during your How to Learn to Code Journey.
``Warnings'' indicate your code ran, but some non-fatal issue arose.
Sometimes these are OK to ignore, sometimes they indicate an issue you
need to look into further. Either way, they should always be
investigated! ``Errors'' are fatal issues and may be the result of
things like syntax errors, typos, and incorrect data types. A good
starting point for investigating any error or warning is Google (chances
are quite high someone has run into the same issue, especially when
you're just learning how to code). You can copy and paste the entire
error/warning into Google and usually return a helpful result.
\section{Use of AI tools}\label{use-of-ai-tools}
Using AI tools such as ChatGPT and Microsoft Copilot can be really
helpful! But before turning to these tools for assistance, try figuring
out the solution yourself. Part of learning how to code is learning how
to \emph{think} like a coder, and that requires doing things the hard
way for a bit. Remember that you are responsible for understanding what
your code is doing and why, and that the output is accurate.
Additionally, depending on the type of work you are doing, you many need
to use additional caution when copying and pasting code/data (any
questions/concerns on this should be directed to your PI/department).
\section{I'm stuck! Additional
resources}\label{im-stuck-additional-resources}
Research IT page on how to use OnDemand:
\url{https://help.rc.unc.edu/ondemand}
Getting started on Longleaf:
\url{https://help.rc.unc.edu/getting-started-on-longleaf}
More stats please!
\url{https://odum.unc.edu/education/short-courses/\#course1}
Still haven't found what you're looking for? Post a message in the How
to Learn to Code Teams!
\section{Bonus: Installing R Studio on your personal
computer}\label{bonus-installing-r-studio-on-your-personal-computer}
A lot of you may want to use R on your personal computer (i.e., not on
Longleaf). There may be reasons why you want to stick with Longleaf
though (e.g., data should not be downloaded on personal devices,
data/analysis requires a lot of memory). If you are interested in
installing R and R Studio on your personal computer, you can use the
below resources for help. All classes will be taught assuming you are
using Longleaf though, so class time won't be dedicated to
troubleshooting R install issues on personal computers.
If you just want to click 2 buttons and figure it out:
\url{https://posit.co/download/rstudio-desktop/}
If you want a more detailed install walkthrough:
\url{https://rstudio-education.github.io/hopr/starting.html}
\bookmarksetup{startatroot}
\chapter{R Coding Basics}\label{r-coding-basics}
Coding Basics, Day 1
\hfill\break
\section{Introduction}\label{introduction-1}
\begin{quote}
Many biologists starting out in bioinformatics tend to equate ``learning
bioinformatics'' with ``learning how to run bioinformatics
software''\ldots{} This is analogous to thinking ``learning molecular
biology'' is just ``learning pipetting.''
--- Vince Buffalo
\end{quote}
In Vince's quote above, replace ``bioinformatics'' with ``coding.''
Our goal for How to Learn to Code is to familiarize students with the R
programming language and RStudio environment, equip students with the
skills and knowledge to wrangle, visualize, and analyze data, and to
provide a foundation for more advanced coding skills.
In Module 1: Coding Basics, we will cover:
\begin{itemize}
\tightlist
\item
Variables
\item
Reproducible environments
\item
RStudio IDE
\item
Various R script and file formats
\item
R syntax
\item
Commenting, writing, and executing code
\item
Functions
\item
Data structures in R
\item
Data types in R
\item
Manipulating data types and structures
\end{itemize}
Curious about what the rest of the classes will look like?
\begin{itemize}
\item
Module 1: Coding Basics
\item
Module 2: Data Visualization
\item
Module 3: Data Wrangling
\item
Module 4: Project Management (and applying everything you've learned
to a real-world dataset!)
\end{itemize}
\section{Objectives of Coding Basics: Class
1}\label{objectives-of-coding-basics-class-1}
\begin{itemize}
\item
Be able to create a variable, define what it is, and follow good
variable naming practices
\item
Understand basic data structures in R
\item
Understand basic data types in R
\item
Perform basic manipulations with data structures and types
\item
Describe benefits of knowing how to code
\end{itemize}
\section{Exploring a dataset}\label{exploring-a-dataset}
R has a few built in datasets that we can use until we cover
installing/loading packages and reading in data files. For the following
examples we will use a built-in dataset in R called ``iris'' that has
some measurements across a few species of flowers. It is one of the most
popular built-in datasets in R. We will use this dataset to explore key
coding concepts: \textbf{variables}, \textbf{data types}, and
\textbf{functions}.
First, let's take a look at the dataset. You can view the dataset
multiple ways. Let's try one--copy the below line of code into your
console and run it.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{iris}
\end{Highlighting}
\end{Shaded}
As we can see, this dataset has a few columns of numbers, in addition to
the species. Let's try a few other ways to look at this dataset. As you
try each method, think about what is different about each method. When
would one method be more beneficial than another?
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{head}\NormalTok{(iris)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{View}\NormalTok{(iris)}
\end{Highlighting}
\end{Shaded}
You are probably already thinking of questions you need the answers to
in order to familiarize yourself with this dataset. What does each row
represent? Each column? How many observations (rows) do we have? What is
the average petal length? Think about other questions you may want to
ask. Think about how you would go about answering those questions with
what you already know. Maybe you'd count each row on your screen to get
the number of observations, or copy the values under
\texttt{Petal.Length} into your phone calculator to calculate the mean.
By the end of this class, you'll be able to do all those things very
quickly in R!
\section{Variables}\label{variables}
A variable is a named space in your computer's memory which can be
referenced and manipulated. It's sort of a name you give ``something'',
and that something can be just about anything.
\begin{figure}[H]
{\centering \includegraphics{scripts/01_codingBasics/class1-files/variables.png}
}
\caption{https://mclark45.medium.com/variables-8d0ba47d9694}
\end{figure}%
Variables in R are created (assigned) using an arrow:
\texttt{\textless{}-} The variable name always goes on the left, and the
thing being assigned to that variable on the right. For example:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{greeting }\OtherTok{\textless{}{-}} \StringTok{"Hello"}
\NormalTok{animal }\OtherTok{\textless{}{-}} \StringTok{"panda"}
\NormalTok{age }\OtherTok{\textless{}{-}} \DecValTok{51}
\end{Highlighting}
\end{Shaded}
The value something is assigned to is often referred to as the variable
name. For example, the variable name of \texttt{"Hello"} is
\texttt{greeting} . We used really basic variable names--just letters,
that are real words, all lowercase. Of course, there are other ways to
name variables too! Play around with variable names. Try using uppercase
letters, symbols, and numbers. What works, and what doesn't? Come up
with some rules for variable naming. Here's some variable naming ideas
to get you started:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{GrEeTiNg }\OtherTok{\textless{}{-}} \StringTok{"Hello"}
\DecValTok{5}\NormalTok{greeting }\OtherTok{\textless{}{-}} \StringTok{"Hello"}
\NormalTok{greeting}\FloatTok{.5} \OtherTok{\textless{}{-}} \StringTok{"Hello"}
\NormalTok{greeting}\SpecialCharTok{@}\DecValTok{5} \OtherTok{\textless{}{-}} \StringTok{"Hello"}
\end{Highlighting}
\end{Shaded}
Now that you know some general rules for variable naming, we can refer
to the \href{https://style.tidyverse.org/syntax.html\#syntax}{Style
Guide} for ``proper'' variable/object naming. Update your variable
naming rule to include the preferred style for variable names according
to the Style Guide.
And now that we know how to properly name variables, assign the iris
dataset to a variable!
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{iris\_dataset\_copy }\OtherTok{\textless{}{-}}\NormalTok{ iris}
\end{Highlighting}
\end{Shaded}
\section{Data types}\label{data-types}
As you probably know from your own work, data can come in many forms.
You can classify dragons as either ``purple'' or ``green'' and also
record the number of spines on their backs as numeric types (15, 27).
Data types are important to understand in R because the type of data
impacts what you can do with that data. For example, it wouldn't make
sense to calculate a mean for the dragon color, but it would for the
number of back spines.
In R, we will focus on three basic data types that are used specify the
type of data stored in a variable (there are a few more, but you
probably won't ever run into them): \textbf{character, numeric,} and
\textbf{logical.}
\textbf{Character:} A character represents a string value. This can be
anything from a single letter to entire paragraphs. Examples include
\texttt{“a”,\ “B”,\ “c\ is\ third”,\ "5"}
\textbf{Numeric:} A decimal value. Examples include
\texttt{1.0,\ 3.1415926535}.
\textbf{Logical:} Logical data types have only two possible values:
\texttt{TRUE} or \texttt{FALSE}.
So far, we have learned about basic data structures (vectors, matrices,
etc.) and basic data types (numeric, character, logical). Now, we want
to start manipulating or \emph{doing things} to them that can be
helpful.
\section{Converting Data Types}\label{converting-data-types}
For example, sometimes when we read in data from a file, numbers can
appear as strings of characters rather than a ``numeric'' type.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{my\_numbers }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\StringTok{"4"}\NormalTok{, }\StringTok{"2"}\NormalTok{, }\StringTok{"7"}\NormalTok{, }\StringTok{"10"}\NormalTok{)}
\FunctionTok{print}\NormalTok{(my\_numbers)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] "4" "2" "7" "10"
\end{verbatim}
How can we tell? Because the numbers above are in quotations, indicating
that they are of the \texttt{character} type and R is interpreting them
as text. Before doing any math or further analysis with these data
points, it's a good idea to convert them to the \texttt{numeric} type
first.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{my\_numbers }\OtherTok{\textless{}{-}} \FunctionTok{as.numeric}\NormalTok{(my\_numbers)}
\FunctionTok{print}\NormalTok{(my\_numbers)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 4 2 7 10
\end{verbatim}
Note that the quotations are now gone. Now, we can do basic (or more
advanced) calculations like the ones below.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# Get minimum out of a list of values}
\FunctionTok{min}\NormalTok{(my\_numbers)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 2
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# Get maximum out of a list of values}
\FunctionTok{max}\NormalTok{(my\_numbers)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 10
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# Get average (mean) out of a list of values}
\FunctionTok{mean}\NormalTok{(my\_numbers)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 5.75
\end{verbatim}