forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
workflow101.Rmd
556 lines (399 loc) · 30.5 KB
/
workflow101.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
# Fundamental development workflows {#sec-workflow101}
```{r, echo = FALSE}
source("common.R")
```
Having peeked under the hood of R packages and libraries in @sec-package-structure-state, here we provide the basic workflows for creating a package and moving it through the different states that come up during development.
## Create a package {#sec-workflow101-create-package}
### Survey the existing landscape
Many packages are born out of one person's frustration at some common task that should be easier.
How should you decide whether something is package-worthy?
There's no definitive answer, but it's helpful to appreciate at least two types of payoff:
- Product: your life will be better when this functionality is implemented formally, in a package.
- Process: greater mastery of R will make you more effective in your work.
If all you care about is the existence of a product, then your main goal is to navigate the space of existing packages.
Silge, Nash, and Graves organized a survey and sessions around this at useR!
2017 and their write up for the R Journal [@silge-nash-graves] provides a comprehensive roundup of resources.
If you are looking for ways to increase your R mastery, you should still educate yourself about the landscape.
But there are plenty of good reasons to make your own package, even if there is relevant prior work.
The way experts got that way is by actually building things, often very basic things, and you deserve the same chance to learn by tinkering.
If you're only allowed to work on things that have never been touched, you're likely looking at problems that are either very obscure or very difficult.
It's also valid to evaluate the suitability of existing tools on the basis of user interface, defaults, and edge case behaviour.
A package may technically do what you need, but perhaps it's very unergonomic for your use case.
In this case, it may make sense for you to develop your own implementation or to write wrapper functions that smooth over the sharp edges.
If your work falls into a well-defined domain, educate yourself about the existing R packages, even if you've resolved to create your own package.
Do they follow specific design patterns?
Are there specific data structures that are common as the primary input and output?
For example, there is a very active R community around spatial data analysis ([r-spatial.org](https://www.r-spatial.org)) that has successfully self-organised to promote greater consistency across packages with different maintainers.
In modeling, the [hardhat package](https://hardhat.tidymodels.org) provides scaffolding for creating a modeling package that plays well with the [tidymodels](https://www.tidymodels.org) ecosystem.
Your package will get more usage and will need less documentation if it fits nicely into the surrounding landscape.
### Name your package
> "There are only two hard things in Computer Science: cache invalidation and naming things."
>
> --- Phil Karlton
Before you can create your package, you need to come up with a name for it.
This can be the hardest part of creating a package!
(Not least because no one can automate it for you.)
#### Formal requirements
There are three formal requirements:
1. The name can only consist of letters, numbers, and periods, i.e., `.`.
2. It must start with a letter.
3. It cannot end with a period.
Unfortunately, this means you can't use either hyphens or underscores, i.e., `-` or `_`, in your package name.
We recommend against using periods in package names, due to confusing associations with file extensions and S3 methods.
#### Things to consider
If you plan to share your package with others, it's important to come up with a good name.
Here are some tips:
- Pick a unique name that's easy to Google.
This makes it easy for potential users to find your package (and associated resources) and for you to see who's using it.
- Don't pick a name that's already in use on CRAN or Bioconductor.
You may also want to consider some other types of name collision:
- Is there an in-development package maturing on, say, GitHub that already has some history and seems to be heading towards release?
- Is this name already used for another piece of software or for a library or framework in, e.g., the Python or JavaScript ecosystem?
- Avoid using both upper and lower case letters: doing so makes the package name hard to type and even harder to remember.
For example, it's hard to remember if it's Rgtk2 or RGTK2 or RGtk2.
- Give preference to names that are pronounceable, so people are comfortable talking about your package and have a way to hear it inside their head.
- Find a word that evokes the problem and modify it so that it's unique.
Here are some examples:
- lubridate makes dates and times easier.
- rvest "harvests" the content from web pages.
- r2d3 provides utilities for working with D3 visualizations.
- forcats is an anagram of factors, which we use **for** **cat**egorical data.
- Use abbreviations, like the following:
- Rcpp = R + C++ (plus plus)
- brms = Bayesian Regression Models using Stan
- Add an extra R, for example:
- stringr provides string tools.
- beepr plays notification sounds.
- callr calls R, from R.
- Don't get sued.
- If you're creating a package that talks to a commercial service, check the branding guidelines. For example, rDrop isn't called rDropbox because Dropbox prohibits any applications from using the full trademarked name.
Nick Tierney presents a fun typology of package names in his [Naming Things](https://www.njtierney.com/post/2018/06/20/naming-things/) blog post, which also includes more inspiring examples.
He also has some experience with renaming packages; the post [So, you've decided to change your r package name](https://www.njtierney.com/post/2017/10/27/change-pkg-name/) is a good resource if you don't get this right the first time.
#### Use the available package
It is impossible to abide by all of the above suggestions simultaneously, so you will need to make some trade-offs.
The [available package](https://cran.r-project.org/package=available) has a function called `available()` that helps you evaluate a potential package name from many angles:
```{r, eval = FALSE}
library(available)
available("doofus")
#> Urban Dictionary can contain potentially offensive results,
#> should they be included? [Y]es / [N]o:
#> 1: 1
#> ── doofus ──────────────────────────────────────────────────────────────────
#> Name valid: ✔
#> Available on CRAN: ✔
#> Available on Bioconductor: ✔
#> Available on GitHub: ✔
#> Abbreviations: http://www.abbreviations.com/doofus
#> Wikipedia: https://en.wikipedia.org/wiki/doofus
#> Wiktionary: https://en.wiktionary.org/wiki/doofus
#> Sentiment:???
```
`available::available()` does the following:
- Checks for validity.
- Checks availability on CRAN, Bioconductor, and beyond.
- Searches various websites to help you discover any unintended meanings. In an interactive session, the URLs you see above are opened in browser tabs.
- Attempts to report whether name has positive or negative sentiment.
`pak::pkg_name_check()` is alternative function with a similar purpose.
Since the pak package is under more active development than available, it may emerge as the better option going forward.
### Package creation {#sec-creating}
Once you've come up with a name, there are two ways to create the package.
- Call `usethis::create_package()`.
- In RStudio, do *File \> New Project \> New Directory \> R Package*. This ultimately calls `usethis::create_package()`, so really there's just one way.
<!-- *TODO: revisit when I tackle usethis + RStudio project templates <https://github.com/r-lib/usethis/issues/770>. In particular, contemplate whether to reinstate any screenshot-y coverage of RStudio workflows here.* -->
This produces the smallest possible *working* package, with three components:
1. An `R/` directory, which you'll learn about in @sec-r.
2. A basic `DESCRIPTION` file, which you'll learn about in @sec-description.
3. A basic `NAMESPACE` file, which you'll learn about in @sec-dependencies-NAMESPACE-file.
It may also include an RStudio project file, `pkgname.Rproj`, that makes your package easy to use with RStudio, as described below.
Basic `.Rbuildignore` and `.gitignore` files are also left behind.
::: callout-warning
Don't use `package.skeleton()` to create a package.
Because this function comes with R, you might be tempted to use it, but it creates a package that immediately throws errors with `R CMD build`.
It anticipates a different development process than we use here, so repairing this broken initial state just makes unnecessary work for people who use devtools (and, especially, roxygen2).
Use `create_package()`.
:::
### Where should you `create_package()`?
The main and only required argument to `create_package()` is the `path` where your new package will live:
```{r, eval = FALSE}
create_package("path/to/package/pkgname")
```
Remember that this is where your package lives in its **source** form (@sec-source-package), not in its **installed** form (@sec-installed-package).
Installed packages live in a **library** and we discussed conventional setups for libraries in @sec-library.
Where should you keep source packages?
The main principle is that this location should be distinct from where installed packages live.
In the absence of external considerations, a typical user should designate a directory inside their home directory for R (source) packages.
We discussed this with colleagues and the source of many tidyverse packages lives inside directories like `~/rrr/`, `~/documents/tidyverse/`, `~/r/packages/`, or `~/pkg/`.
Some of us use one directory for this, others divide source packages among a few directories based on their development role (contributor vs. not), GitHub organization (tidyverse vs r-lib), development stage (active vs. not), and so on.
The above probably reflects that we are primarily tool-builders.
An academic researcher might organize their files around individual publications, whereas a data scientist might organize around data products and reports.
There is no particular technical or traditional reason for one specific approach.
As long as you keep a clear distinction between source and installed packages, just pick a strategy that works within your overall system for file organization, and use it consistently.
## RStudio Projects {#sec-workflow101-rstudio-projects}
devtools works hand-in-hand with RStudio, which we believe is the best development environment for most R users.
To be clear, you can use devtools without using RStudio and you can develop packages in RStudio without using devtools.
But there is a special, two-way relationship that makes it very rewarding to use devtools and RStudio together.
::: callout-tip
## RStudio
An RStudio **Project**, with a capital "P", is a regular directory on your computer that includes some (mostly hidden) RStudio infrastructure to facilitate your work on one or more **projects**, with a lowercase "p".
A project might be an R package, a data analysis, a Shiny app, a book, a blog, etc.
:::
### Benefits of RStudio Projects
From @sec-source-package, you already know that a source package lives in a directory on your computer.
We strongly recommend that each source package is also an RStudio Project.
Here are some of the payoffs:
- Projects are very "launch-able".
It's easy to fire up a fresh instance of RStudio in a Project, with the file browser and working directory set exactly the way you need, ready for work.
- Each Project is isolated; code run in one Project does not affect any other Project.
- You can have several RStudio Projects open at once and code executed in Project A does not have any effect on the R session and workspace of Project B.
- You get handy code navigation tools like `F2` to jump to a function definition and `Ctrl + .` to look up functions or files by name.
- You get useful keyboard shortcuts and a clickable interface for common package development tasks, like generating documentation, running tests, or checking the entire package.
```{r}
#| echo: false
#| label: fig-keyboard-shortcuts
#| out-width: ~
#| fig-cap: >
#| Keyboard Shortcut Quick Reference in RStudio.
#| fig-alt: |
#| Screenshot of an RStudio window with a semi-transparent
#| black box displaying keyboard shortcuts that overlays a
#| majority of the window.
knitr::include_graphics("images/keyboard-shortcuts.png")
```
::: callout-tip
## RStudio
To see the most useful keyboard shortcuts, press Alt + Shift + K or use *Help \> Keyboard Shortcuts Help*.
You should see something like @fig-keyboard-shortcuts.
RStudio also provides the [*Command Palette*](https://docs.posit.co/ide/user/ide/reference/shortcuts.html#command-palette) which gives fast, searchable access to all of the IDE's commands -- especially helpful when you can't remember a particular keyboard shortcut.
It is invoked via Ctrl + Shift + P (Windows & Linux) or Cmd + Shift + P (macOS).
:::
::: callout-tip
## RStudio
Follow \@[rstudiotips](https://twitter.com/rstudiotips) on Twitter for a regular dose of RStudio tips and tricks.
:::
### How to get an RStudio Project
If you follow our recommendation to create new packages with `create_package()`, each new package will also be an RStudio Project, if you're working from RStudio.
If you need to designate the directory of a pre-existing source package as an RStudio Project, choose one of these options:
- In RStudio, do *File \> New Project \> Existing Directory*.
- Call `create_package()` with the path to the pre-existing R source package.
- Call `usethis::use_rstudio()`, with the [active usethis project](#sec-rstudio-project-vs-active-usethis-project) set to an existing R package. In practice, this probably means you just need to make sure your working directory is inside the pre-existing package directory.
### What makes an RStudio Project?
A directory that is an RStudio Project will contain an `.Rproj` file.
Typically, if the directory is named "foo", the Project file is `foo.Rproj`.
And if that directory is also an R package, then the package name is usually also "foo".
The path of least resistance is to make all of these names coincide and to NOT nest your package inside a subdirectory inside the Project.
If you settle on a different workflow, just know it may feel like you are fighting with the tools.
An `.Rproj` file is just a text file.
Here is a representative project file you might see in a Project initiated via usethis:
```
Version: 1.0
RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
Encoding: UTF-8
AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
LineEndingConversion: Posix
BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
```
You don't need to modify this file by hand.
Instead, use the interface available via *Tools \> Project Options* (@fig-project-options) or *Project Options* in the Projects menu in the top-right corner (@fig-projects-menu).
```{r}
#| echo: false
#| label: fig-project-options
#| out-width: ~
#| fig-cap: >
#| Project Options in RStudio.
#| fig-alt: >
#| The Project Options preference page in the RStudio IDE.
#| On the left side are nine categories: General, Code Editing, R Markdown,
#| Python, Sweave, Spelling, Build Tools, Git/SVN, and Environments.
#| The General category is selected. In the main part of the window are
#| three options with dropdown boxes:
#| 1. Restore .RData into workspace at startup;
#| 2. Save workspace to .RData on exit; and
#| 3. Always save history (even if not saving .RData).
#| All three of these options are set to "(Default)".
#| There are two options with checkboxes:
#| 1. Disable .Rprofile execution on session start/resume
#| 2. Quit child processes on exit
#| Both of these are unchecked.
#| At the top of the window is the statement:
#| "Use (Default) to inherit the global default setting".
knitr::include_graphics("images/project-options-2.png")
```
```{r}
#| echo: false
#| label: fig-projects-menu
#| out-width: 35%
#| fig-cap: >
#| Projects Menu in RStudio.
#| fig-alt: >
#| Image of the RStudio IDE Projects drop-down menu, with items:
#| "New Project...", "Open Project...", "Open Project in New Session...",
#| "Close Project", then a list of projects the user has had open
#| recently, "Close Project", and "Project Options...".
#| The "Project Options..." item is selected.
knitr::include_graphics("images/project-options-1.png")
```
### How to launch an RStudio Project
Double-click the `foo.Rproj` file in macOS's Finder or Windows Explorer to launch the foo Project in RStudio.
You can also launch Projects from within RStudio via *File \> Open Project (in New Session)* or the Projects menu in the top-right corner.
If you use a productivity or launcher app, you can probably configure it to do something delightful for `.Rproj` files.
We both use Alfred for this [^workflow101-1], which is macOS only, but similar tools exist for Windows.
In fact, this is a very good reason to use a productivity app in the first place.
[^workflow101-1]: Specifically, we configure Alfred to favor `.Rproj` files in its search results when proposing apps or files to open.
To register the `.Rproj` file type with Alfred, go to *Preferences \> Features \> Default Results \> Advanced*.
Drag any `.Rproj` file onto this space and then close.
It is very normal -- and productive!
-- to have multiple Projects open at once.
### RStudio Project vs. active usethis project {#sec-rstudio-project-vs-active-usethis-project}
You will notice that most usethis functions don't take a path: they operate on the files in the "active usethis project".
The usethis package assumes that 95% of the time all of these coincide:
- The current RStudio Project, if using RStudio.
- The active usethis project.
- Current working directory for the R process.
If things seem funky, call `proj_sitrep()` to get a "situation report".
This will identify peculiar situations and propose ways to get back to a happier state.
```{r eval = FALSE}
# these should usually be the same (or unset)
proj_sitrep()
#> * working_directory: '/Users/jenny/rrr/readxl'
#> * active_usethis_proj: '/Users/jenny/rrr/readxl'
#> * active_rstudio_proj: '/Users/jenny/rrr/readxl'
```
## Working directory and filepath discipline
As you develop your package, you will be executing R code.
This will be a mix of workflow calls (e.g., `document()` or `test()`) and *ad hoc* calls that help you write your functions, examples, and tests.
We *strongly recommend* that you keep the top-level of your source package as the working directory of your R process.
This will generally happen by default, so this is really a recommendation to avoid development workflows that require you to fiddle with working directory.
If you're totally new to package development, you don't have much basis for supporting or resisting this proposal.
But those with some experience may find this recommendation somewhat upsetting.
You may be wondering how you are supposed to express paths when working in subdirectories, such as `tests/`.
As it becomes relevant, we'll show you how to exploit path-building helpers, such as `testthat::test_path()`, that determine paths at execution time.
The basic idea is that by leaving working directory alone, you are encouraged to write paths that convey intent explicitly ("read `foo.csv` from the test directory") instead of implicitly ("read `foo.csv` from current working directory, which I *think* is going to be the test directory").
A sure sign of reliance on implicit paths is incessant fiddling with your working directory, because you're using `setwd()` to manually fulfill the assumptions that are implicit in your paths.
Using explicit paths can design away a whole class of path headaches and makes day-to-day development more pleasant as well.
There are two reasons why implicit paths are hard to get right:
- Recall the different forms that a package can take during the development cycle (@sec-package-structure-state). These states differ from each other in terms of which files and folders exist and their relative positions within the hierarchy. It's tricky to write relative paths that work across all package states.
- Eventually, your package will be processed with built-in tools like `R CMD build`, `R CMD check`, and `R CMD INSTALL`, by you and potentially CRAN. It's hard to keep track of what the working directory will be at every stage of these processes.
Path helpers like `testthat::test_path()`, `fs::path_package()`, and the [rprojroot package](https://rprojroot.r-lib.org) are extremely useful for building resilient paths that hold up across the whole range of situations that come up during development and usage.
Another way to eliminate brittle paths is to be rigorous in your use of proper methods for storing data inside your package (@sec-data) and to target the session temp directory when appropriate, such as for ephemeral testing artefacts (@sec-testing-basics).
## Test drive with `load_all()` {#sec-workflow101-load-all}
The `load_all()` function is arguably the most important part of the devtools workflow.
```{r, eval = FALSE}
# with devtools attached and
# working directory set to top-level of your source package ...
load_all()
# ... now experiment with the functions in your package
```
`load_all()` is the key step in this "lather, rinse, repeat" cycle of package development:
1. Tweak a function definition.
2. `load_all()`
3. Try out the change by running a small example or some tests.
When you're new to package development or to devtools, it's easy to overlook the importance of `load_all()` and fall into some awkward habits from a data analysis workflow.
### Benefits of `load_all()`
When you first start to use a development environment, like RStudio or VS Code, the biggest win is the ability to send lines of code from an `.R` script for execution in R console.
The fluidity of this is what makes it tolerable to follow the best practice of regarding your source code as real [^workflow101-2] (as opposed to objects in the workspace) and saving `.R` files (as opposed to saving and reloading `.Rdata`).
[^workflow101-2]: Quoting the usage philosophy favored by [Emacs Speaks Statistics](https://ess.r-project.org/Manual/ess.html#Philosophies-for-using-ESS_0028R_0029) (ESS).
`load_all()` has the same significance for package development and, ironically, requires that you NOT test drive package code in the same way as script code.
`load_all()` *simulates* the full blown process for seeing the effect of a source code change, which is clunky enough [^workflow101-3] that you won't want to do it very often.
@fig-load-all reinforces that the `library()` function can only load a package that has been installed, whereas `load_all()` gives a high-fidelity simulation of this, based on the current package source.
[^workflow101-3]: The command line approach is to quit R, go to the shell, do `R CMD build foo` in the package's parent directory, then `R CMD INSTALL foo_x.y.x.tar.gz`, restart R, and call `library(foo`).
```{r}
#| echo: false
#| label: fig-load-all
#| out-width: ~
#| fig-cap: >
#| devtools::load_all() vs. library().
#| fig-alt: >
#| Diagram listing five package states: source, bundle, binary, installed,
#| in memory.
#| Two functions for converting a package from one state to another are
#| depicted.
#| First, `devtools::load_all()` is shown to convert a source package to one
#| that is in memory.
#| Second, `library()` is shown to put an installed package in memory.
knitr::include_graphics("diagrams/loading.png")
```
The main benefits of `load_all()` include:
- You can iterate quickly, which encourages exploration and incremental progress.
- This iterative speedup is especially noticeable for packages with compiled code.
- You get to develop interactively under a namespace regime that accurately mimics how things are when someone uses your installed package, with the following additional advantages:
- You can call your own internal functions directly, without using `:::` and without being tempted to temporarily define your functions in the global workspace.
- You can also call functions from other packages that you've imported into your `NAMESPACE`, without being tempted to attach these dependencies via `library()`.
`load_all()` removes friction from the development workflow and eliminates the temptation to use workarounds that often lead to mistakes around namespace and dependency management.
### Other ways to call `load_all()`
When working in a Project that is a package, RStudio offers several ways to call `load_all()`:
- Keyboard shortcut: Cmd+Shift+L (macOS), Ctrl+Shift+L (Windows, Linux)
- Build pane's *More ...* menu
- *Build \> Load All*
`devtools::load_all()` is a thin wrapper around `pkgload::load_all()` that adds a bit of user-friendliness.
It is unlikely you will use `load_all()` programmatically or inside another package, but if you do, you should probably use `pkgload::load_all()` directly.
## `check()` and `R CMD check` {#sec-workflow101-r-cmd-check}
Base R provides various command line tools and `R CMD check` is the official method for checking that an R package is valid.
It is essential to pass `R CMD check` if you plan to submit your package to CRAN, but we **highly recommend** holding yourself to this standard even if you don't intend to release your package on CRAN.
`R CMD check` detects many common problems that you'd otherwise discover the hard way.
Our recommended way to run `R CMD check` is in the R console via devtools:
```{r}
#| eval: false
devtools::check()
```
We recommend this because it allows you to run `R CMD check` from within R, which dramatically reduces friction and increases the likelihood that you will `check()` early and often!
This emphasis on fluidity and fast feedback is exactly the same motivation as given for `load_all()`.
In the case of `check()`, it really is executing `R CMD check` for you.
It's not just a high fidelity simulation, which is the case for `load_all()`.
::: callout-tip
## RStudio
RStudio exposes `check()` in the *Build* menu, in the *Build* pane via *Check*, and in keyboard shortcuts Ctrl + Shift + E (Windows & Linux) or Cmd + Shift + E (macOS).
:::
A rookie mistake that we see often in new package developers is to do too much work on their package before running `R CMD check`.
Then, when they do finally run it, it's typical to discover many problems, which can be very demoralizing.
It's counter-intuitive but the key to minimizing this pain is to run `R CMD check` more often: the sooner you find a problem, the easier it is to fix[^workflow101-4].
We model this behaviour very intentionally in @sec-whole-game.
[^workflow101-4]: A great blog post advocating for "if it hurts, do it more often" is [FrequencyReducesDifficulty](https://martinfowler.com/bliki/FrequencyReducesDifficulty.html) by Martin Fowler.
The upper limit of this approach is to run `R CMD check` every time you make a change.
We don't run `check()` manually quite that often, but when we're actively working on a package, it's typical to `check()` multiple times per day.
Don't tinker with your package for days, weeks, or months, waiting for some special milestone to finally run `R CMD check`.
If you use GitHub (@sec-sw-dev-practices-git-github), we'll show you how to set things up so that `R CMD check` runs automatically every time you push (@sec-sw-dev-practices-gha).
### Workflow {#check-workflow}
Here's what happens inside `devtools::check()`:
- Ensures that the documentation is up-to-date by running `devtools::document()`.
- Bundles the package before checking it (@sec-bundled-package).
This is the best practice for checking packages because it makes sure the check starts with a clean slate: because a package bundle doesn't contain any of the temporary files that can accumulate in your source package, e.g. artifacts like `.so` and `.o` files which accompany compiled code, you can avoid the spurious warnings such files will generate.
- Sets the `NOT_CRAN` environment variable to `"true"`.
This allows you to selectively skip tests on CRAN.
See `?testthat::skip_on_cran` and @sec-testing-advanced-skip-on-cran for details.
The workflow for checking a package is simple, but tedious:
1. Run `devtools::check()`, or press Ctrl/Cmd + Shift + E.
2. Fix the first problem.
3. Repeat until there are no more problems.
`R CMD check` returns three types of messages:
- `ERROR`s: Severe problems that you should fix regardless of whether or not you're submitting to CRAN.
- `WARNING`s: Likely problems that you must fix if you're planning to submit to CRAN (and a good idea to look into even if you're not).
- `NOTE`s: Mild problems or, in a few cases, just an observation.
If you are submitting to CRAN, you should strive to eliminate all NOTEs, even if they are false positives.
If you have no NOTEs, human intervention is not required, and the package submission process will be easier.
If it's not possible to eliminate a `NOTE`, you'll need describe why it's OK in your submission comments, as described in @sec-release-process.
If you're not submitting to CRAN, carefully read each NOTE.
If it's easy to eliminate the NOTEs, it's worth it, so that you can continue to strive for a totally clean result.
But if eliminating a NOTE will have a net negative impact on your package, it is reasonable to just tolerate it.
Make sure that doesn't lead to you ignoring other issues that really should be addressed.
`R CMD check` consists of dozens of individual checks and it would be overwhelming to enumerate them here.
See our [online-only guide to `R CMD check`](https://r-pkgs.org/R-CMD-check.html) for details.
### Background on `R CMD check`
As you accumulate package development experience, you might want to access `R CMD check` directly at some point.
Remember that `R CMD check` is something you must run in the terminal, not in the R console.
You can see its documentation like so:
``` bash
R CMD check --help
```
`R CMD check` can be run on a directory that holds an R package in source form (@sec-source-package) or, preferably, on a package bundle (@sec-bundled-package):
``` bash
R CMD build somepackage
R CMD check somepackage_0.0.0.9000.tar.gz
```
To learn more, see the [Checking packages](https://cran.r-project.org/doc/manuals/R-exts.html#Checking-packages) section of [Writing R Extensions](https://cran.r-project.org/doc/manuals/R-exts.html).