-
-
Notifications
You must be signed in to change notification settings - Fork 3
/
best-practices-scientific-computing-2012.tex
766 lines (666 loc) · 35.1 KB
/
best-practices-scientific-computing-2012.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
\documentclass[10pt]{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{cite}
\usepackage{color}
\topmargin 0.0cm
\oddsidemargin 0.5cm
\evensidemargin 0.5cm
\textwidth 16cm
\textheight 21cm
\usepackage[labelfont=bf,labelsep=period,justification=raggedright]{caption}
\bibliographystyle{plos2009}
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother
\date{}
\pagestyle{myheadings}
\newcommand{\withurl}[2]{{#1} ({\texttt{#2}})}
\newcommand{\term}[1]{\emph{#1}}
\newcommand{\practicesection}[2]{\section*{#1}\label{#2}}
\newcommand{\practice}[2]{\textbf{\emph{{#2}~({#1})}}}
\begin{document}
\begin{flushleft}
{\Large
\textbf{Best Practices for Scientific Computing}
}
% Insert Author names, affiliations and corresponding author email.
\\
{Greg~Wilson}$^{1,\ast}$,
{D.~A.~Aruliah}$^{2}$,
{C.~Titus~Brown}$^{3}$,
{Neil~P.~Chue~Hong}$^{4}$,
{Matt~Davis}$^{5}$,
{Richard~T.~Guy}$^{6}$,
{Steven~H.D.~Haddock}$^{7}$,
{Kathryn~D.~Huff}$^{8}$,
{Ian~M.~Mitchell}$^{9}$,
{Mark~D.~Plumbley}$^{10}$,
{Ben~Waugh}$^{11}$,
{Ethan~P.~White}$^{12}$,
{Paul~Wilson}$^{13}$
\\
\bf{1} Mozilla Foundation, Toronto, Ontario, Canada M5V 1R9 / [email protected]
\\
\bf{2} University of Ontario Institute of Technology, Oshawa, Ontario, Canada L1H 7K4 / [email protected]
\\
\bf{3} Michigan State University, East Lansing, Michigan, USA 48824 / [email protected]
\\
\bf{4} Software Sustainability Institute, Edinburgh, UK EH9 3JZ / [email protected]
\\
\bf{5} Space Telescope Science Institute, Baltimore, Maryland, USA 21218 / [email protected]
\\
\bf{6} University of Toronto, Toronto, Ontario, Canada M5S 2E4 / [email protected]
\\
\bf{7} Monterey Bay Aquarium Research Institute, Moss Landing, California, USA 95039 / [email protected]
\\
\bf{8} University of California Berkeley, Berkeley, California, USA 94720 / [email protected]
\\
\bf{9} University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4 / [email protected]
\\
\bf{10} Queen Mary University of London, London, UK E1 4NS / [email protected]
\\
\bf{11} University College London, London, UK WC1E 6BT / [email protected]
\\
\bf{12} Utah State University, Logan, Utah, USA 84322 / [email protected]
\\
\bf{13} University of Wisconsin, Madison, Wisconsin, USA 53706 / [email protected]
\\
$\ast$ E-mail: Corresponding [email protected]
\end{flushleft}
\title{Best Practices for Scientific Computing}
\section*{Introduction}
Scientists spend an increasing amount of time building and using
software. However, most scientists are never taught how to do this
efficiently. As a result, many are unaware of tools and practices that
would allow them to write more reliable and maintainable code with
less effort. We describe a set of best practices for scientific
software development that have solid foundations in research and
experience, and that improve scientists' productivity and the
reliability of their software.
Software is as important to modern scientific research as telescopes and test
tubes. From groups that work exclusively on computational problems, to
traditional laboratory and field scientists, more and more of the daily
operation of science revolves around developing
new algorithms, managing and analyzing the large amounts of data that are
generated in single research projects, and combining disparate
datasets to assess synthetic problems, and other computational tasks.
Scientists typically develop their own software for these purposes
because doing so requires substantial domain-specific knowledge. As a
result, recent studies have found that scientists typically spend 30\%
or more of their time developing software
\cite{hannay2008,prabhu2011}. However, 90\% or more of them are
primarily self-taught \cite{hannay2008,prabhu2011}, and therefore lack
exposure to basic software development practices such as writing
maintainable code, using version control and issue trackers, code
reviews, unit testing, and task automation.
We believe that software is just another kind of experimental
apparatus \cite{vardi2010} and should be built, checked, and used as
carefully as any physical apparatus. However, while most scientists
are careful to validate their laboratory and field equipment, most do
not know how reliable their software is
\cite{hatton1994,hatton1997}. This can lead to serious errors
impacting the central conclusions of published research
\cite{merali2010}: recent high-profile retractions, technical
comments, and corrections because of errors in computational methods
include papers in \emph{Science} \cite{chang2006,ferrari2013},
\emph{PNAS} \cite{ma2007}, the \emph{Journal of Molecular Biology}
\cite{chang2007}, \emph{Ecology Letters} \cite{lees2007,currie2007},
the \emph{Journal of Mammalogy} \cite{kelt2008}, \emph{Journal of the
American College of Cardiology} \cite{jaccretract2013},
\emph{Hypertension} \cite{hypertension2012}
and \emph{The American Economic Review} \cite{herndon2013}.
In addition, because software is often used for more than a single
project, and is often reused by other scientists, computing errors can
have disproportionate impacts on the scientific process. This type of
cascading impact caused several prominent retractions when an error
from another group's code was not discovered until after publication
\cite{merali2010}. As with bench experiments, not everything must be
done to the most exacting standards; however, scientists need to be
aware of best practices both to improve their own approaches and for
reviewing computational work by others.
This paper describes a set of practices that are easy to adopt and
have proven effective in many research settings. Our recommendations
are based on several decades of collective experience both building
scientific software and teaching computing to scientists
\cite{aranda2012,wilson2006b}, reports from many other groups
\cite{heroux2009,kane2003,kane2006,killcoyne2009,matthews2008,pitt-francis2008,pouillon2010},
guidelines for commercial and open source software development
\cite{spolsky2000,fogel2005}, and on empirical studies of scientific
computing \cite{carver2007,kelly2009,segal2005,segal2008a} and
software development in general (summarized in \cite{oram2010}). None
of these practices will guarantee efficient, error-free software
development, but used in concert they will reduce the number of errors
in scientific software, make it easier to reuse, and save the authors
of the software time and effort that can used for focusing on the
underlying scientific questions.
Our practices are summarized in Box~1.
For reasons of space,
we do not discuss the equally important (but independent) issues of
reproducible research,
publication and citation of code and data,
and open science.
We do believe,
however,
that all of these will be much easier to implement
if scientists have the skills we describe.
\practicesection{Write programs for people, not computers.}{cognition}
Scientists writing software need to write code that both executes
correctly and can be easily read and understood by other
programmers (especially the author's future self).
If software cannot be easily read and understood, it is
much more difficult to know that it is actually doing what it is intended
to do.
To be productive,
software developers must therefore take several aspects of human cognition into account:
in particular,
that human working memory is limited, human pattern
matching abilities are finely tuned, and human attention span is short
\cite{baddeley2009,hock2008, letovsky1986,binkley2009,robinson2005}.
First, \practice{1a}{a program should not require its readers to hold
more than a handful of facts in memory at once}. Human working
memory can hold only a handful of items at a time, where each item is
either a single fact or a ``chunk'' aggregating several facts
\cite{baddeley2009,hock2008}, so programs should limit
the total number of items to be remembered to accomplish a task.
The primary way to accomplish this is to break programs up
into easily understood functions, each of which conducts a single,
easily understood, task. This serves to make each piece of the
program easier to understand in the same way that breaking up a scientific
paper using sections and paragraphs makes it easier to read. For
example, a function to calculate the area of a rectangle can be
written to take four separate coordinates:
\begin{small}
\begin{verbatim}
def rect_area(x1, y1, x2, y2):
...calculation...
\end{verbatim}
\end{small}
\noindent
or to take two points:
\begin{small}
\begin{verbatim}
def rect_area(point1, point2):
...calculation...
\end{verbatim}
\end{small}
\noindent
The latter function is significantly easier for people to read and
remember, while the former is likely to lead to errors, not least
because it is possible to call the original with values in the wrong
order:
\begin{small}
\begin{verbatim}
surface = rect_area(x1, x2, y1, y2)
\end{verbatim}
\end{small}
Second, scientists should \practice{1b}{make names consistent,
distinctive, and meaningful}. For example, using non-descriptive
names, like \texttt{a} and \texttt{foo}, or names that are very
similar, like \texttt{results} and \texttt{results2}, is likely to
cause confusion.
Third, scientists should \practice{1c}{make code style and formatting
consistent}. If different parts of a scientific paper used different
formatting and capitalization, it would make that paper more difficult
to read. Likewise, if different parts of a program are indented
differently, or if programmers mix \texttt{CamelCaseNaming} and
\texttt{pothole\_case\_naming}, code takes longer to read and readers
make more mistakes \cite{letovsky1986,binkley2009}.
\practicesection{Let the computer do the work.}{automation}
Science often involves repetition of computational tasks such as
processing large numbers of data files in the same way or regenerating
figures each time new data is added to an existing analysis. Computers
were invented to do these kinds of repetitive tasks but, even today,
many scientists type the same commands in over and over again or click
the same buttons repeatedly \cite{aranda2012}. In addition to wasting
time, sooner or later even the most careful researcher will lose focus
while doing this and make mistakes.
Scientists should therefore \practice{2a}{make the computer repeat tasks}
and \practice{2b}{save recent commands in a file for re-use}. For example,
most command-line tools have a ``history'' option that lets users
display and re-execute recent commands, with minor edits to filenames
or parameters. This is often cited as one reason command-line
interfaces remain popular \cite{ray2009,haddock2010}: ``do this
again'' saves time and reduces errors.
A file containing commands for an interactive system is often called a
\term{script}, though there is real no difference between this and a
program. When these scripts are repeatedly used in the same way, or
in combination, a workflow management tool can be used. The
paradigmatic example is compiling and linking programs in languages
such as Fortran, C++, Java, and C\# \cite{dubois2003b}. The most
widely used tool for this task is probably
\withurl{Make}{http://www.gnu.org/software/make}, although many
alternatives are now available \cite{smith2011}. All of these allow
people to express dependencies between files, i.e., to say that if A
or B has changed, then C needs to be updated using a specific set of
commands. These tools have been successfully adopted for scientific
workflows as well \cite{fomel2007}.
To avoid errors and inefficiencies from repeating commands manually,
we recommend that scientists \practice{2c}{use a build tool to automate
workflows}, e.g., specify the ways in which intermediate data files
and final results depend on each other, and on the programs that
create them, so that a single command will regenerate anything that
needs to be regenerated.
In order to maximize reproducibility, everything needed to re-create
the output should be recorded automatically in a format that other
programs can read. (Borrowing a term from archaeology and forensics,
this is often called the \term{provenance} of data.) There have been
some initiatives to automate the collection of this information, and
standardize its format \cite{openprovenance}, but it is already
possible to record the following without additional tools:
\begin{itemize}
\item unique identifiers and version numbers for raw data records
(which scientists may need to create themselves);
\item unique identifiers and version numbers for programs and
libraries;
\item the values of parameters used to generate any given output; and
\item the names and version numbers of programs (however small) used
to generate those outputs.
\end{itemize}
\practicesection{Make incremental changes.}{incremental}
Unlike traditional commercial software developers, but very much like
developers in open source projects or startups, scientific programmers
usually don't get their requirements from customers, and their
requirements are rarely frozen \cite{segal2008a,segal2008b}. In fact,
scientists often \emph{can't} know what their programs should do next
until the current version has produced some results. This challenges
design approaches that rely on specifying requirements in advance.
Programmers are most productive when they \practice{3a}{work in small
steps with frequent feedback and course correction} rather than
trying to plan months or years of work in advance. While the details
vary from team to team, these developers typically work in steps that
are sized to be about an hour long, and these steps are often grouped
in iterations that last roughly one week. This accommodates the
cognitive constraints discussed in the first section, and
acknowledges the reality that real-world requirements are constantly
changing. The goal is to produce working (if incomplete) code after
each iteration. While these practices have been around for decades,
they gained prominence starting in the late 1990s under the banner of
\term{agile development} \cite{martin2002,kniberg2007}.
Two of the biggest challenges scientists and other programmers face
when working with code and data are keeping track of changes (and
being able to revert them if things go wrong), and collaborating on a
program or dataset \cite{matthews2008}. Typical ``solutions'' are to
email software to colleagues or to copy successive versions of it to a
shared folder, e.g., \withurl{Dropbox}{http://www.dropbox.com}. However,
both approaches are fragile and can lead to confusion and lost work
when important changes are overwritten or out-of-date files are
used. It's also difficult to find out which changes are in which
versions or to say exactly how particular results were computed at a
later date.
The standard solution in both industry and open source is to
\practice{3b}{use a version control system} (VCS)
\cite{mcconnell2004,fogel2005}. A VCS stores snapshots of a project's
files in a \term{repository} (or a set of repositories). Programmers
can modify their working copy of the project at will, then
\term{commit} changes to the repository when they are satisfied with
the results to share them with colleagues.
Crucially, if several people have edited files simultaneously, the VCS
highlights the differences and requires them to resolve any conflicts
before accepting the changes. The VCS also stores the entire history
of those files, allowing arbitrary versions to be retrieved and
compared, together with metadata such as comments on what was changed
and the author of the changes. All of this information can be
extracted to provide provenance for both code and data.
Many good VCSes are open source and freely available, including
\withurl{Git}{http://git-scm.com},
\withurl{Subversion}{http://subversion.apache.org}, and
\withurl{Mercurial}{http://mercurial.selenic.com}. Many free hosting
services are available as well,
with \withurl{GitHub}{https://github.com},
\withurl{BitBucket}{https://bitbucket.org},
\withurl{SourceForge}{http://sourceforge.net},
and \withurl{Google Code}{http://code.google.com}
being the most popular. As
with coding style, the best one to use is almost always whatever your
colleagues are already using \cite{fogel2005}.
Reproducibility is maximized when scientists \practice{3c}{put everything
that has been created manually in version control}, including
programs, original field observations, and the source files for
papers. Automated output and intermediate files can be regenerated at
need. Binary files (e.g., images and audio clips) may be stored in
version control, but it is often more sensible to use an archiving
system for them, and store the metadata describing their contents in
version control instead \cite{noble2009}.
\practicesection{Don't repeat yourself (or others).}{dry}
Anything that is repeated in two or more places is more difficult to
maintain. Every time a change or correction is made, multiple
locations must be updated, which increases the chance of errors and
inconsistencies. To avoid this, programmers follow the DRY Principle
\cite{hunt1999}, for ``don't repeat yourself'', which applies to both
data and code.
For data, this maxim holds that \practice{4a}{every piece of data must
have a single authoritative representation in the system}. Physical
constants ought to be defined exactly once to ensure that the entire
program is using the same value; raw data files should have a single
canonical version, every geographic location from which data has been
collected should be given an ID that can be used to look up its
latitude and longitude, and so on.
The DRY Principle applies to code at two scales. At small scales,
\practice{4b}{modularize code rather than copying and pasting}. Avoiding
``code clones'' has been shown to reduce error rates
\cite{juergens2009}: when a change is made or a bug is fixed, that
change or fix takes effect everywhere, and people's mental model of
the program (i.e., their belief that ``this one's been fixed'')
remains accurate. As a side effect, modularizing code allows people to
remember its functionality as a single mental chunk, which in turn
makes code easier to understand. Modularized code can also be more
easily repurposed for other projects.
At larger scales, it is vital that scientific programmers
\practice{4c}{re-use code instead of rewriting it}. Tens of millions of
lines of high-quality open source software are freely available on the
web, and at least as much is available commercially. It is typically
better to find an established library or package that solves a problem
than to attempt to write one's own routines for well established
problems (e.g., numerical integration, matrix inversions, etc.).
\practicesection{Plan for mistakes.}{defensive}
Mistakes are inevitable, so verifying and maintaining the validity of
code over time is immensely challenging \cite{grubb2003}. While no
single practice has been shown to catch or prevent all mistakes,
several are very effective when used in combination
\cite{mcconnell2004,dubois2005,sanders2008}.
The first line of defense is \term{defensive programming}.
Experienced programmers \practice{5a}{add assertions to programs to check
their operation} because experience has taught them that everyone
(including their future self) makes mistakes. An \term{assertion} is
simply a statement that something holds true at a particular point in
a program; as the example below shows, assertions can be used to
ensure that inputs are valid, outputs are consistent, and so
on.
\begin{small}
\begin{verbatim}
def bradford_transfer(grid, point, smoothing):
assert grid.contains(point),
'Point is not located in grid'
assert grid.is_local_maximum(point),
'Point is not a local maximum in grid'
assert len(smoothing) > FILTER_LENGTH,
'Not enough smoothing parameters'
...do calculations...
assert 0.0 < result <= 1.0,
'Bradford transfer value out of legal range'
return result
\end{verbatim}
\end{small}
\noindent
Assertions can make up a sizeable fraction of the code in well-written
applications, just as tools for calibrating scientific instruments can make up a
sizeable fraction of the equipment in a lab. These assertions serve two
purposes. First, they ensure that if something does go wrong, the program will
halt immediately, which simplifies debugging. Second, assertions are
\term{executable documentation}, i.e., they explain the program as well as
checking its behavior. This makes them more useful in many cases than comments
since the reader can be sure that they are accurate and up to date.
The second layer of defense is \term{automated testing}. Automated tests can
check to make sure that a single unit of code is returning correct results
(\term{unit tests}), that pieces of code work correctly when combined
(\term{integration tests}), and that the behavior of a program doesn't change
when the details are modified (\term{regression tests}). These tests are conducted
by the computer, so that they are easy to rerun every time the program is modified.
Creating and managing tests is easier if programmers \practice{5b}{use an
off-the-shelf unit testing library} to initialize inputs, run tests, and
report their results in a uniform way. These libraries are available for all
major programming languages including those commonly used in scientific computing
\cite{xunit,meszaros2007,osherove2009}.
Tests check to see whether the code matches the researcher's expectations of its
behavior, which depends on the researcher's understanding of the problem at hand
\cite{hook2009,kelly2008,oberkampf2010}. For example, in scientific computing,
tests are often conducted by comparing output to simplified cases, experimental
data, or the results of earlier programs that are trusted. Another approach for
generating tests is to \practice{5c}{turn bugs into test cases}
by writing tests that trigger a bug that has been found in the code and (once fixed) will prevent
the bug from reappearing unnoticed. In combination these kinds of testing can
improve our confidence that scientific code is operating properly
and that the results it produces are valid. An additional benefit of testing is
that it encourages programmers to design and build code that is testable (i.e.,
self-contained functions and classes that can run more or less independently of
one another). Code that is designed this way is also easier to understand
and more reusable.
Now matter how good ones computational practice is, reasonably complex code will
always initially contain bugs. Fixing bugs that have been identified is often
easier if you \practice{5d}{use a symbolic debugger} to track them down. A better
name for this kind of tool would be ``interactive program inspector'' since a
debugger allows users to pause a program at any line (or when some condition is
true), inspect the values of variables, and walk up and down active function
calls to figure out why things are behaving the way they are. Debuggers are
usually more productive than adding and removing print statements or scrolling
through hundreds of lines of log output \cite{zeller2009}, because they allow
the user to see exactly how the code is executing rather than just snapshots of
state of the program at a few moments in time. In other words, the debugger
allows the scientist to witness what is going wrong directly, rather than having
to anticipate the error or infer the problem using indirect evidence.
\practicesection{Optimize software only after it works correctly.}{performance}
Today's computers and software are so complex that even experts find
it hard to predict which parts of any particular program will be
performance bottlenecks \cite{jones1999}. The most productive way to
make code fast is therefore to make it work correctly, determine
whether it's actually worth speeding it up, and---in those cases where
it is---to \practice{6a}{use a profiler to identify bottlenecks}.
This strategy also has interesting implications for choice of
programming language. Research has confirmed that most programmers
write roughly the same number of lines of code per unit time
regardless of the language they use \cite{prechelt2010}. Since faster,
lower level, languages require more lines of code to accomplish the
same task, scientists are most productive when they
\practice{6b}{write code in the highest-level language possible}, and
shift to low-level languages like C and Fortran only when they are
sure the performance boost is needed. (Using higher-level languages
also helps program comprehensibility, since such languages have, in a
sense, ``pre-chunked'' the facts that programmers need to have in
short-term memory.)
Taking this approach allows more code to be written (and tested) in
the same amount of time. Even when it is known before coding begins
that a low-level language will ultimately be necessary, rapid
prototyping in a high-level language helps programmers make and
evaluate design decisions quickly. Programmers can also use a
high-level prototype as a test oracle for a high-performance low-level
reimplementation, i.e., compare the output of the optimized (and
usually more complex) program against the output from its unoptimized
(but usually simpler) predecessor in order to check its correctness.
\practicesection{Document design and purpose, not mechanics.}{embeddoc}
In the same way that a well documented experimental protocol makes
research methods easier to reproduce, good documentation helps people
understand code. This makes the code more reusable and lowers
maintenance costs \cite{mcconnell2004}. As a result, code that is well
documented makes it easier to transition when the graduate students
and postdocs who have been writing code in a lab transition to the
next career phase. Reference documentation and descriptions of design
decisions are key for improving the understandability of
code. However, inline documentation that recapitulates code is
\emph{not} useful. Therefore we recommend that scientific programmers
\practice{7a}{document interfaces and reasons, not implementations}. For
example, a clear description like this at the beginning of a function that
describes what it does and its inputs and outputs is useful:
\begin{small}
\begin{verbatim}
def scan(op, values, seed=None):
# Apply a binary operator cumulatively to the values given
# from lowest to highest, returning a list of results.
# For example, if 'op' is 'add' and 'values' is '[1, 3, 5]',
# the result is '[1, 4, 9]' (i.e., the running total of the
# given values). The result always has the same length as
# the input.
# If 'seed' is given, the result is initialized with that
# value instead of with the first item in 'values', and
# the final item is omitted from the result.
# Ex: scan(add, [1, 3, 5], seed=10) => [10, 11, 14]
...implementation...
\end{verbatim}
\end{small}
\noindent
In contrast, the comment in the code fragment below does nothing to
aid comprehension:
\begin{small}
\begin{verbatim}
i = i + 1 # Increment the variable 'i' by one.
\end{verbatim}
\end{small}
If a substantial description of the implementation of a piece of
software is needed, it is better to \practice{7b}{refactor code in
preference to explaining how it works}, i.e., rather than write a
paragraph to explain a complex piece of code, reorganize the code
itself so that it doesn't need such an explanation. This may not
always be possible---some pieces of code simply are intrinsically
difficult---but the onus should always be on the author to convince
his or her peers of that.
The best way to create and maintain reference documentation is to
\practice{7c}{embed the documentation for a piece of software in that
software}. Doing this increases the probability that when
programmers change the code, they will update the documentation at the
same time.
Embedded documentation usually takes the form of specially-formatted
and placed comments. Typically, a \term{documentation generator} such
as Javadoc, Doxygen, or Sphinx
extracts these comments and generates well-formatted web pages and
other human-friendly \withurl{documents}{http://en.wikipedia.org/wiki/\-Comparison\-\_of\_documentation\_generators}. Alternatively, code can be embedded in
a larger document that includes information about what the code is
doing (i.e., literate programming). Common approaches to this include
this use of knitr \cite{xie2013knitr} and IPython Notebooks
\cite{perez2007}.
\practicesection{Collaborate.}{collaborate}
In the same way that having manuscripts reviewed by other scientists
can reduce errors and make research easier to understand, reviews of
source code can eliminate bugs and improve readability. A large body
of research has shown that \term{code reviews} are the most
cost-effective way of finding bugs in code
\cite{fagan1976,cohen2010}. They are also a good way to spread
knowledge and good practices around a team. In projects with shifting
membership, such as most academic labs, code reviews help ensure that
critical knowledge isn't lost when a student or postdoc leaves the
lab.
Code can be reviewed either before or after it has been committed to a
shared version control repository. Experience shows that if reviews
don't have to be done in order to get code into the repository, they
will soon not be done at all \cite{fogel2005}. We therefore recommend
that projects \practice{8a}{use pre-merge code reviews}.
An extreme form of code review is \term{pair programming}, in which
two developers sit together while writing code. One (the driver)
actually writes the code; the other (the navigator) provides real-time
feedback and is free to track larger issues of design and consistency.
Several studies have found that pair programming improves productivity
\cite{williams2010}, but many programmers find it intrusive. We
therefore recommend that teams \practice{8b}{use pair programming when
bringing someone new up to speed and when tackling particularly
tricky problems}.
Once a team grows beyond a certain size, it becomes difficult to keep
track of what needs to be reviewed, or of who's doing what. Teams can
avoid a lot of duplicated effort and dropped balls if they
\practice{8c}{use an issue tracking tool} to maintain a list of tasks to
be performed and bugs to be fixed \cite{dubois2003a}. This helps avoid
duplicated work and makes it easier for tasks to be transferred to
different people. Free repository hosting services like GitHub include
issue tracking tools, and many good standalone tools exist as well,
such as \withurl{Trac}{http://trac.edgewall.org}.
\section*{Conclusion}\label{conclusion}
We have outlined a series of recommended best practices for scientific
computing based on extensive research, as well as our collective
experience. These practices can be applied to individual work as
readily as group work; separately and together, they improve the
productivity of scientific programming and the reliability of the
resulting code, and therefore the speed with which we produce results
and our confidence in them. They are also, we believe, prerequisites
for reproducible computational research: if software is not version
controlled, readable, and tested, the chances of its authors (much less anyone else)
being able to re-create results are remote.
Our 25 recommendations are a beginning, not an end. Individuals and
groups who have incorporated them into their work will find links to
more advanced practices at
\withurl{Software Carpentry}{http://software-carpentry.org}.
Research suggests that the time cost of implementing these kinds of
tools and approaches in scientific computing is almost immediately
offset by the gains in productivity of the programmers involved
\cite{aranda2012}. Even so, the recommendations described above may
seem intimidating to implement. Fortunately, the different practices
reinforce and support one another, so the effort required is less than
the sum of adding each component separately. Nevertheless, we do not
recommend that research groups attempt to implement all of these
recommendations at once, but instead suggest that these tools be
introduced incrementally over a period of time.
How to implement the recommended practices can be learned from many
excellent tutorials available online or through workshops and classes
organized by groups like Software Carpentry. This type of
training has proven effective at driving adoption of these tools in
scientific settings \cite{aranda2012,wilson2013}.
For computing to achieve the level of rigor that is expected
throughout other parts of science, it is necessary for scientists to
begin to adopt the tools and approaches that are known to improve both
the quality of software and the efficiency with which it is
produced. To facilitate this adoption, universities and funding
agencies need to support the training of scientists in the use of
these tools and the investment of time and money in building better
scientific software. Investment in these approaches by both
individuals and institutions will improve our confidence in the
results of computational science and will allow us to make more rapid
progress on important scientific questions than would otherwise be
possible.
\bibliography{best-practices-scientific-computing-2012}
\subsection*{Acknowledgments}
We are grateful to
Joel Adamson,
Aron Ahmadia,
Roscoe Bartlett,
Erik Bray,
Steven Crouch,
Michael Jackson,
Justin Kitzes,
Adam Obeng,
Karthik Ram,
Yoav Ram,
and Tracy Teal
for feedback on this paper.
\pagebreak
\section*{Box 1}
{\footnotesize
\begin{enumerate}
\item Write programs for people, not computers.
\begin{enumerate}
\item A program should not require its readers to hold more than a handful of facts in memory at once.
\item Make names consistent, distinctive, and meaningful.
\item Make code style and formatting consistent.
\end{enumerate}
\item Let the computer do the work.
\begin{enumerate}
\item Make the computer repeat tasks.
\item Save recent commands in a file for re-use.
\item Use a build tool to automate workflows.
\end{enumerate}
\item Make incremental changes.
\begin{enumerate}
\item Work in small steps with frequent feedback and course correction.
\item Use a version control system.
\item Put everything that has been created manually in version control.
\end{enumerate}
\item Don't repeat yourself (or others).
\begin{enumerate}
\item Every piece of data must have a single authoritative representation in the system.
\item Modularize code rather than copying and pasting.
\item Re-use code instead of rewriting it.
\end{enumerate}
\item Plan for mistakes.
\begin{enumerate}
\item Add assertions to programs to check their operation.
\item Use an off-the-shelf unit testing library.
\item Turn bugs into test cases.
\item Use a symbolic debugger.
\end{enumerate}
\item Optimize software only after it works correctly.
\begin{enumerate}
\item Use a profiler to identify bottlenecks.
\item Write code in the highest-level language possible.
\end{enumerate}
\item Document design and purpose, not mechanics.
\begin{enumerate}
\item Document interfaces and reasons, not implementations.
\item Refactor code in preference to explaining how it works.
\item Embed the documentation for a piece of software in that software.
\end{enumerate}
\item Collaborate.
\begin{enumerate}
\item Use pre-merge code reviews.
\item Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems.
\item Use an issue tracking tool.
\end{enumerate}
\end{enumerate}
}
\end{document}