-
Notifications
You must be signed in to change notification settings - Fork 3
/
16-for.html
558 lines (558 loc) · 30 KB
/
16-for.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Intermediate R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Intermediate R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">For loops</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Write and understand <code>for</code> loops.</li>
</ul>
</div>
</section>
<h3 id="repeating-operations">Repeating operations</h3>
<p>Often when we’re trying to solve a problem or run some analysis we find ourselves doing the same thing over, and over, and over again on different groupings of data, or on different files, or with slight parameter variations.</p>
<p>The great thing about R, and programming in general, is it allows us to be <strong>lazy</strong>. Why do a repetitive task if you can make the computer do it for you?</p>
<p>For example, lets say I wanted to calculated the total population for each continent in the gapminder dataset in 2007. We could do this in several ways, but the most basic approach is manually:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> "Asia"</span>, <span class="kw">sum</span>(pop)]</code></pre></div>
<pre class="output"><code>[1] 3811953827
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> "Africa"</span>, <span class="kw">sum</span>(pop)]</code></pre></div>
<pre class="output"><code>[1] 929539692
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> "Americas"</span>, <span class="kw">sum</span>(pop)]</code></pre></div>
<pre class="output"><code>[1] 898871184
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> "Europe"</span>, <span class="kw">sum</span>(pop)]</code></pre></div>
<pre class="output"><code>[1] 586098529
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> "Oceania"</span>, <span class="kw">sum</span>(pop)]</code></pre></div>
<pre class="output"><code>[1] 24549947
</code></pre>
<p>This is tedious to type out. We can do it, but imagine if we wanted to run some calculation for each country!</p>
<p>The clever way to do this would be to use our recently acquired data.table skills:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap[year ==<span class="st"> </span><span class="dv">2007</span>, <span class="kw">sum</span>(pop), by=continent]</code></pre></div>
<pre class="output"><code> continent V1
1: Asia 3811953827
2: Europe 586098529
3: Africa 929539692
4: Americas 898871184
5: Oceania 24549947
</code></pre>
<p>But sometimes the solution to a problem isn’t obvious, or doesn’t fit into a format we’re used to, so it’s helpful to have multiple tools in our problem-solving toolbox to fall back on.</p>
<p>With a for loop we can instead <em>iterate</em> over each continent, and tell R to run the same command:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for (cc in gap[,<span class="kw">unique</span>(continent)]) {
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span>yy &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
<span class="kw">print</span>(<span class="kw">paste</span>(cc, <span class="st">":"</span>, popsum))
}</code></pre></div>
<pre class="output"><code>Error in eval(expr, envir, enclos): object 'yy' not found
</code></pre>
<p>This construct tells R to go through each thing on the right of the <code>in</code> operator and store it in the variable <code>cc</code>. Inside the <em>body</em> of the <code>for</code> loop, i.e. any lines of code that fall between the curly braces (<code>{</code> and <code>}</code>), we can then access the value of <code>cc</code> to do whatever we like. So first, <code>cc</code> will hold the value “Asia”, then it will run the line of code, and return back to the top of the loop. Next <code>cc</code> will hold the value “Europe”, and do the same thing, and so on.</p>
<p>What if we want to look at the change in total population for each continent over the years? We can “nest” for loops to iterate through multiple separate conditions:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for (cc in gap[,<span class="kw">unique</span>(continent)]) {
for (yy in gap[,<span class="kw">unique</span>(year)]) {
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span>yy &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
<span class="kw">print</span>(<span class="kw">paste</span>(cc, yy, <span class="st">":"</span>, popsum))
}
}</code></pre></div>
<pre class="output"><code>[1] "Asia 1952 : 1395357351.99999"
[1] "Asia 1957 : 1562780599"
[1] "Asia 1962 : 1696357182"
[1] "Asia 1967 : 1905662900"
[1] "Asia 1972 : 2150972248"
[1] "Asia 1977 : 2384513556"
[1] "Asia 1982 : 2610135582"
[1] "Asia 1987 : 2871220762"
[1] "Asia 1992 : 3133292191"
[1] "Asia 1997 : 3383285500"
[1] "Asia 2002 : 3601802203"
[1] "Asia 2007 : 3811953827"
[1] "Europe 1952 : 418120846"
[1] "Europe 1957 : 437890351"
[1] "Europe 1962 : 460355155"
[1] "Europe 1967 : 481178958"
[1] "Europe 1972 : 500635059"
[1] "Europe 1977 : 517164531"
[1] "Europe 1982 : 531266901"
[1] "Europe 1987 : 543094160"
[1] "Europe 1992 : 558142797"
[1] "Europe 1997 : 568944148"
[1] "Europe 2002 : 578223869"
[1] "Europe 2007 : 586098529"
[1] "Africa 1952 : 237640501"
[1] "Africa 1957 : 264837738"
[1] "Africa 1962 : 296516865"
[1] "Africa 1967 : 335289489"
[1] "Africa 1972 : 379879541"
[1] "Africa 1977 : 433061021"
[1] "Africa 1982 : 499348587"
[1] "Africa 1987 : 574834110"
[1] "Africa 1992 : 659081517"
[1] "Africa 1997 : 743832984"
[1] "Africa 2002 : 833723916"
[1] "Africa 2007 : 929539692"
[1] "Americas 1952 : 345152446"
[1] "Americas 1957 : 386953916"
[1] "Americas 1962 : 433270254"
[1] "Americas 1967 : 480746623"
[1] "Americas 1972 : 529384210"
[1] "Americas 1977 : 578067699"
[1] "Americas 1982 : 630290920"
[1] "Americas 1987 : 682753971"
[1] "Americas 1992 : 739274104"
[1] "Americas 1997 : 796900410"
[1] "Americas 2002 : 849772762"
[1] "Americas 2007 : 898871184"
[1] "Oceania 1952 : 10686006"
[1] "Oceania 1957 : 11941976"
[1] "Oceania 1962 : 13283518"
[1] "Oceania 1967 : 14600414"
[1] "Oceania 1972 : 16106100"
[1] "Oceania 1977 : 17239000"
[1] "Oceania 1982 : 18394850"
[1] "Oceania 1987 : 19574415"
[1] "Oceania 1992 : 20919651"
[1] "Oceania 1997 : 22241430"
[1] "Oceania 2002 : 23454829"
[1] "Oceania 2007 : 24549947"
</code></pre>
<h4 id="for-or-apply-the-second-circle-of-hell.">For or Apply? The second circle of hell.</h4>
<blockquote>
<p>We made our way into the second Circle, here live the gluttons. – <a href="http://www.burns-stat.com/pages/Tutor/R_inferno.pdf">The R inferno</a></p>
</blockquote>
<p>One of the biggest things that trips up novices and experienced R users alike, is building a results object (vector, list, matrix, data frame) as your for loop progresses. For example:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">results <-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">continent=</span><span class="kw">character</span>(), <span class="dt">year=</span><span class="kw">numeric</span>(), <span class="dt">popsum=</span><span class="kw">numeric</span>())
for (cc in gap[,<span class="kw">unique</span>(continent)]) {
for (yy in gap[,<span class="kw">unique</span>(year)]) {
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span>yy &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
this_result <-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">continent=</span>cc, <span class="dt">year=</span>yy, <span class="dt">popsum=</span>popsum)
results <-<span class="st"> </span><span class="kw">rbind</span>(results, this_result)
}
}
results</code></pre></div>
<pre class="output"><code> continent year popsum
1 Asia 1952 1395357352
2 Asia 1957 1562780599
3 Asia 1962 1696357182
4 Asia 1967 1905662900
5 Asia 1972 2150972248
6 Asia 1977 2384513556
7 Asia 1982 2610135582
8 Asia 1987 2871220762
9 Asia 1992 3133292191
10 Asia 1997 3383285500
11 Asia 2002 3601802203
12 Asia 2007 3811953827
13 Europe 1952 418120846
14 Europe 1957 437890351
15 Europe 1962 460355155
16 Europe 1967 481178958
17 Europe 1972 500635059
18 Europe 1977 517164531
19 Europe 1982 531266901
20 Europe 1987 543094160
21 Europe 1992 558142797
22 Europe 1997 568944148
23 Europe 2002 578223869
24 Europe 2007 586098529
25 Africa 1952 237640501
26 Africa 1957 264837738
27 Africa 1962 296516865
28 Africa 1967 335289489
29 Africa 1972 379879541
30 Africa 1977 433061021
31 Africa 1982 499348587
32 Africa 1987 574834110
33 Africa 1992 659081517
34 Africa 1997 743832984
35 Africa 2002 833723916
36 Africa 2007 929539692
37 Americas 1952 345152446
38 Americas 1957 386953916
39 Americas 1962 433270254
40 Americas 1967 480746623
41 Americas 1972 529384210
42 Americas 1977 578067699
43 Americas 1982 630290920
44 Americas 1987 682753971
45 Americas 1992 739274104
46 Americas 1997 796900410
47 Americas 2002 849772762
48 Americas 2007 898871184
49 Oceania 1952 10686006
50 Oceania 1957 11941976
51 Oceania 1962 13283518
52 Oceania 1967 14600414
53 Oceania 1972 16106100
54 Oceania 1977 17239000
55 Oceania 1982 18394850
56 Oceania 1987 19574415
57 Oceania 1992 20919651
58 Oceania 1997 22241430
59 Oceania 2002 23454829
60 Oceania 2007 24549947
</code></pre>
<p>“Growing” a results object like this is bad practice. At each iteration, R needs to talk to the computer’s operating system to ask for the right amount of memory for your new results object. Like all diplomatic negotiations, this can take a while (at least in computer time!). As a result, you might find that your for loops seem to take forever when you start working with bigger datasets or more complex calculations.</p>
<p>It’s much better to tell R how big your results object will be up front, that way R only needs to ask the computer for the right amount of memory once:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># First lets calculate the number of rows we need:</span>
nresults <-<span class="st"> </span>gap[,<span class="kw">length</span>(<span class="kw">unique</span>(continent))] *<span class="st"> </span>gap[,<span class="kw">length</span>(<span class="kw">unique</span>(year))]
results <-<span class="st"> </span><span class="kw">data.frame</span>(
<span class="dt">continent=</span><span class="kw">character</span>(<span class="dt">length=</span>nresults),
<span class="dt">year=</span><span class="kw">numeric</span>(<span class="dt">length=</span>nresults),
<span class="dt">popsum=</span><span class="kw">numeric</span>(<span class="dt">length=</span>nresults)
)
<span class="co"># Instead of iterating over values, we need to keep track of indices so we know</span>
<span class="co"># which row to insert or new results into at each iteration. </span>
<span class="co"># `seq_along` will create a sequence of numbers based on the length of the </span>
<span class="co"># vector. So instead of c("Asia", "Americas", "Europe", "Africa", "Oceania"),</span>
<span class="co"># ii will store c(1,2,3,4,5)</span>
continents <-<span class="st"> </span>gap[,<span class="kw">unique</span>(continent)]
years <-<span class="st"> </span>gap[,<span class="kw">unique</span>(year)]
<span class="co"># We also need to keep track of which row to insert into. We could do fancy </span>
<span class="co"># math based on our indices, but this is hard to get right and can lead to hard</span>
<span class="co"># to detect errors. Its much easier to just keep track of this manually. </span>
this_row <-<span class="st"> </span><span class="dv">1</span>
for (ii in <span class="kw">seq_along</span>(continents)) {
for (jj in <span class="kw">seq_along</span>(years)) {
<span class="co"># Now we need to look-up the appopriate values based on our indices</span>
cc <-<span class="st"> </span>continents[ii]
yy <-<span class="st"> </span>years[jj]
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span>yy &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
results[this_row,] <-<span class="st"> </span><span class="kw">list</span>(cc, yy, popsum)
<span class="co"># Increment the row counter</span>
this_row <-<span class="st"> </span>this_row +<span class="st"> </span><span class="dv">1</span>
}
}</code></pre></div>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Asia"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Europe"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Africa"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Americas"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<pre class="output"><code>Warning in `[<-.factor`(`*tmp*`, iseq, value = "Oceania"): invalid factor
level, NA generated
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">results</code></pre></div>
<pre class="output"><code> continent year popsum
1 <NA> 1952 1395357352
2 <NA> 1957 1562780599
3 <NA> 1962 1696357182
4 <NA> 1967 1905662900
5 <NA> 1972 2150972248
6 <NA> 1977 2384513556
7 <NA> 1982 2610135582
8 <NA> 1987 2871220762
9 <NA> 1992 3133292191
10 <NA> 1997 3383285500
11 <NA> 2002 3601802203
12 <NA> 2007 3811953827
13 <NA> 1952 418120846
14 <NA> 1957 437890351
15 <NA> 1962 460355155
16 <NA> 1967 481178958
17 <NA> 1972 500635059
18 <NA> 1977 517164531
19 <NA> 1982 531266901
20 <NA> 1987 543094160
21 <NA> 1992 558142797
22 <NA> 1997 568944148
23 <NA> 2002 578223869
24 <NA> 2007 586098529
25 <NA> 1952 237640501
26 <NA> 1957 264837738
27 <NA> 1962 296516865
28 <NA> 1967 335289489
29 <NA> 1972 379879541
30 <NA> 1977 433061021
31 <NA> 1982 499348587
32 <NA> 1987 574834110
33 <NA> 1992 659081517
34 <NA> 1997 743832984
35 <NA> 2002 833723916
36 <NA> 2007 929539692
37 <NA> 1952 345152446
38 <NA> 1957 386953916
39 <NA> 1962 433270254
40 <NA> 1967 480746623
41 <NA> 1972 529384210
42 <NA> 1977 578067699
43 <NA> 1982 630290920
44 <NA> 1987 682753971
45 <NA> 1992 739274104
46 <NA> 1997 796900410
47 <NA> 2002 849772762
48 <NA> 2007 898871184
49 <NA> 1952 10686006
50 <NA> 1957 11941976
51 <NA> 1962 13283518
52 <NA> 1967 14600414
53 <NA> 1972 16106100
54 <NA> 1977 17239000
55 <NA> 1982 18394850
56 <NA> 1987 19574415
57 <NA> 1992 20919651
58 <NA> 1997 22241430
59 <NA> 2002 23454829
60 <NA> 2007 24549947
</code></pre>
<p>As you can see, this involves a lot more work. Most R users will even go so far to tell you that for loops are bad, and that you should use something called <code>apply</code> instead! We’ll cover this in the next lesson, and later we’ll show you another method, <code>foreach</code> which also handles object creation for you.</p>
<p>For loops are most useful when you’re performing a series of calculations where each iteration depends on the results of the last (for example a random walk).</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4 id="tip-while-loops"><span class="glyphicon glyphicon-pushpin"></span>Tip: While loops</h4>
</div>
<div class="panel-body">
<p>Sometimes you will find yourself needing to repeat an operation until a certain condition is met. You can do this with a <code>while</code> loop.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">while(this condition is true){
do a thing
}</code></pre></div>
<p>As an example, here’s a while loop that generates random numbers from a uniform distribution (the <code>runif</code> function) between 0 and 1 until it gets one that’s less than 0.1.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">z <-<span class="st"> </span><span class="dv">1</span>
while(z ><span class="st"> </span><span class="fl">0.1</span>){
z <-<span class="st"> </span><span class="kw">runif</span>(<span class="dv">1</span>)
<span class="kw">print</span>(z)
}</code></pre></div>
<p><code>while</code> loops will not always be appropriate. You have to be particularly careful that you don’t end up in an infinite loop because your condition is never met.</p>
</div>
</aside>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-1"><span class="glyphicon glyphicon-pencil"></span>Challenge 1</h4>
</div>
<div class="panel-body">
<p>Write a script that loops through the <code>gapminder</code> data by continent and prints out the mean life expectancy in 1952.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-3"><span class="glyphicon glyphicon-pencil"></span>Challenge 3</h4>
</div>
<div class="panel-body">
<p>Modify the script so that it loops through the years as well as the continents.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-4"><span class="glyphicon glyphicon-pencil"></span>Challenge 4</h4>
</div>
<div class="panel-body">
<p>Write a for loop that performs a random walk for 100 steps, then plot the result.</p>
<p>Hint: You can use <code>sign(rnorm(1))</code> in the body of the loop to randomly choose a direction (forward or backward) at each iteration.</p>
<p>Hint: You will want to store the resulting position (starting at 0) after each iteration for plotting purposes.</p>
<p>Hint: give the <code>plot</code> function the indices 0:100 as the x axis, and the stored positions as the y axis. specify the ‘type’ argument as “l” to draw a the path.</p>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>