-
Notifications
You must be signed in to change notification settings - Fork 3
/
17-apply.html
188 lines (178 loc) · 11.5 KB
/
17-apply.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Intermediate R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Intermediate R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Apply functions</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>To learn how to use <em>apply</em> to automate tasks efficiently</li>
<li>To know the difference between <code>apply</code>, <code>lapply</code>, <code>sapply</code>, <code>tapply</code> and <code>mapply</code>.</li>
</ul>
</div>
</section>
<h3 id="vectorized-task-automation">Vectorized task automation</h3>
<p>Previously we introduced you to <code>for</code> loops: a basic programming construct that is common across many programming languages. R has more optimised way of automating tasks that are not only faster than for loops, but also take away the pain of having to pre-define your results object.</p>
<p>The most common function you will encounter is <code>lapply</code>, and the closely related <code>sapply</code>.</p>
<p>Lets take a look at the following <code>for</code> loop:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for (cc in gap[,<span class="kw">unique</span>(continent)]) {
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
<span class="kw">print</span>(<span class="kw">paste</span>(cc, <span class="st">":"</span>, popsum))
}</code></pre></div>
<p>It calculates the total population on each continent, then prints out the result. If instead we want to save these results, we can either make a vector in advance and save the results, or use one of the <code>apply</code> to take care of this detail for us:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">results <-<span class="st"> </span><span class="kw">lapply</span>(gap[,<span class="kw">unique</span>(continent)], function(cc) {
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
popsum
})
<span class="kw">names</span>(results) <-<span class="st"> </span>gap[,<span class="kw">unique</span>(continent)]
results</code></pre></div>
<pre class="output"><code>$Asia
[1] 3811953827
$Europe
[1] 586098529
$Africa
[1] 929539692
$Americas
[1] 898871184
$Oceania
[1] 24549947
</code></pre>
<p><code>lapply</code> takes a vector (or list) as its first argument (in this case a vector of the continent names), then a function as its second argument. This function is then executed on every element in the first argument. This is very similar to a for loop: first, <code>cc</code> stores the first continent name, “Asia”, then runs the code in the function body, then <code>cc</code> stores the second continent name, and runs the function body, and so on. The code in the function body can be thought of in exactly the same way as the body of the <code>for</code> loop. The result of the last line is then returned to <code>lapply</code>, which combines the results into a list.</p>
<p><code>sapply</code> is identical to <code>lapply</code>, except that it tries to simplify the results object. If we run the same code with <code>sapply</code> instead of <code>lapply</code> the results will be returned as a vector:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">results <-<span class="st"> </span><span class="kw">sapply</span>(gap[,<span class="kw">unique</span>(continent)], function(cc) {
popsum <-<span class="st"> </span>gap[year ==<span class="st"> </span><span class="dv">2007</span> &<span class="st"> </span>continent ==<span class="st"> </span>cc, <span class="kw">sum</span>(pop)]
popsum
})
<span class="kw">names</span>(results) <-<span class="st"> </span>gap[,<span class="kw">unique</span>(continent)]
results</code></pre></div>
<pre class="output"><code> Asia Europe Africa Americas Oceania
3811953827 586098529 929539692 898871184 24549947
</code></pre>
<h3 id="apply">apply</h3>
<p>The <code>apply</code> function is useful for matrix data: it allows you loop over either the rows or columns of a matrix.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># create some dummy data</span>
r <-<span class="st"> </span><span class="kw">matrix</span>(<span class="kw">rnorm</span>(<span class="dv">10</span>*<span class="dv">4</span>), <span class="dt">nrow=</span><span class="dv">10</span>)
<span class="kw">colnames</span>(r) <-<span class="st"> </span>letters[<span class="dv">1</span>:<span class="dv">4</span>]
<span class="kw">rownames</span>(r) <-<span class="st"> </span>LETTERS[<span class="dv">1</span>:<span class="dv">10</span>]</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Get the maximum value in each row:</span>
<span class="kw">apply</span>(r, <span class="dv">1</span>, max)</code></pre></div>
<pre class="output"><code> A B C D E F G
1.9026721 0.8409824 1.2589881 2.3216163 0.9566330 1.3741352 1.8549427
H I J
1.1066072 1.0799746 0.8618778
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># and for each column:</span>
<span class="kw">apply</span>(r, <span class="dv">2</span>, max)</code></pre></div>
<pre class="output"><code> a b c d
1.7335561 1.9026721 0.8618778 2.3216163
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h3 id="means-and-sums"><span class="glyphicon glyphicon-pushpin"></span>means and sums</h3>
</div>
<div class="panel-body">
<p>R has inbuilt functions for summing or calculating the mean of rows and columns: <code>colSums</code>, <code>colMeans</code>, <code>rowSums</code>, <code>rowMeans</code>. These are faster than writing your own <code>apply</code>!</p>
</div>
</aside>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h3 id="the-return-of-apply"><span class="glyphicon glyphicon-pushpin"></span>the return of apply</h3>
</div>
<div class="panel-body">
<p>When the function given to <code>apply</code> returns a vector instead of a single value the results will always be combined into columns, even if running the function across the rows!</p>
</div>
</aside>
<h3 id="mapply">mapply</h3>
<p>The <code>mapply</code> function can be used to run a function with different combinations of arguments. Let’s take a look at an example:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">a <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">4</span>
b <-<span class="st"> </span><span class="dv">4</span>:<span class="dv">1</span>
<span class="kw">mapply</span>(rep, a, b)</code></pre></div>
<pre class="output"><code>[[1]]
[1] 1 1 1 1
[[2]]
[1] 2 2 2
[[3]]
[1] 3 3
[[4]]
[1] 4
</code></pre>
<p>This is the same as running the following code:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">rep</span>(a[<span class="dv">1</span>], b[<span class="dv">1</span>])</code></pre></div>
<pre class="output"><code>[1] 1 1 1 1
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">rep</span>(a[<span class="dv">2</span>], b[<span class="dv">2</span>])</code></pre></div>
<pre class="output"><code>[1] 2 2 2
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">rep</span>(a[<span class="dv">3</span>], b[<span class="dv">3</span>])</code></pre></div>
<pre class="output"><code>[1] 3 3
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">rep</span>(a[<span class="dv">4</span>], b[<span class="dv">4</span>])</code></pre></div>
<pre class="output"><code>[1] 4
</code></pre>
<p>or the following <code>lapply</code> statement:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">lapply</span>(<span class="dv">1</span>:<span class="dv">4</span>, function(ii) {
<span class="kw">rep</span>(a[ii], b[ii])
})</code></pre></div>
<pre class="output"><code>[[1]]
[1] 1 1 1 1
[[2]]
[1] 2 2 2
[[3]]
[1] 3 3
[[4]]
[1] 4
</code></pre>
<h3 id="tapply">tapply</h3>
<p>The <code>tapply</code> function allows you to run a function on different groups within a vector. Going back to our first example of the lesson, we can use <code>tapply</code> to calculate the total population for each continent in 2007:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gap2007 <-<span class="st"> </span>gap[year ==<span class="st"> </span><span class="dv">2007</span>] <span class="co"># first filter by the year</span>
<span class="kw">tapply</span>(
gap2007[,pop], <span class="co"># The column of population counts from the data frame</span>
gap2007[,continent], <span class="co"># The column of continent labels for each entry</span>
sum <span class="co"># The function to apply to the population vector within each continent</span>
)</code></pre></div>
<pre class="output"><code> Africa Americas Asia Europe Oceania
929539692 898871184 3811953827 586098529 24549947
</code></pre>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>