-
Notifications
You must be signed in to change notification settings - Fork 6
/
README.Rmd
140 lines (99 loc) · 6.23 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
should_eval <-
rlang::is_installed("xgboost") &&
rlang::is_installed("parsnip") &&
rlang::is_installed("callr")
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
eval = should_eval
)
```
# bundle
<!-- badges: start -->
[![R-CMD-check](https://github.com/rstudio/bundle/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/rstudio/bundle/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/bundle)](https://CRAN.R-project.org/package=bundle)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![Codecov test coverage](https://codecov.io/gh/rstudio/bundle/branch/main/graph/badge.svg)](https://app.codecov.io/gh/rstudio/bundle?branch=main)
<!-- badges: end -->
Typically, models in R exist in memory and can be saved as `.rds` files. However, some models store information in locations that cannot be saved using `save()` or `saveRDS()` directly. The goal of bundle is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.
## Installation
You can install the released version of bundle from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("bundle")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("rstudio/bundle")
```
## Overview
We often imagine a trained model as a somewhat "standalone" R object---given some new data, the R object can generate predictions on its own. In reality, some types of model objects also make use of _references_ to generate predictions. A reference is a piece of information that a model object refers to that isn't part of the object itself; this could be anything from a connection with a server to an internal function in the package used to train the model. When we call `predict()`, model objects know where to look to retrieve that data, but saving model objects can sometimes disrupt those references. Thus, if we want to train a model, save it, re-load it into memory in a production setting, and generate predictions with it, we may run into issues because those references do not exist in the new computational environment.
We need some way to preserve access to those references. The bundle package provides a consistent interface for _bundling_ model objects with their references so that they can be safely saved and re-loaded in production:
```{r diagram-04, echo = FALSE, fig.alt = "A diagram showing a rectangle, labeled model object, and another rectangle, labeled predictions. The two are connected by an arrow from model object to predictions, with the label 'predict'. There are two boxes labeled reference, connected to the arrow labeled predict with dotted arrows, to show that, most of the time, we don't need to think about including them in our workflow. There are two boxes, labeled R Session number one, and R session number two. In focus is the arrow from the model object, in R Session number one, to a model object in R session number two. This arrow connecting the model object in R session one and the model object in R session two is connected by a verb called bundle. The bundle function outputs an object called a bundle.", out.width = '100%'}
knitr::include_graphics("man/figures/diagram_04.png")
```
For more on this diagram, see the [main bundle vignette](https://rstudio.github.io/bundle/articles/bundle.html).
When you're ready to save your model, `bundle()` it first. Once you've loaded it in a new setting, `unbundle()` it!
## Example
The bundle package prepares model objects so that they can be effectively saved and re-loaded for use in new R sessions. To demonstrate using bundle, we will train a boosted tree model using [XGBoost](https://xgboost.readthedocs.io/), bundle it, and then pass the bundle into another R session to generate predictions on new data.
First, load needed packages:
```{r example, message = FALSE, warning = FALSE}
library(bundle)
library(parsnip)
library(callr)
library(waldo)
```
Fit the boosted tree model:
```{r}
# fit an boosted tree with xgboost via parsnip
mod <-
boost_tree(trees = 5, mtry = 3) %>%
set_mode("regression") %>%
set_engine("xgboost") %>%
fit(mpg ~ ., data = mtcars[1:25,])
mod
```
Note that simply saving and loading the model results in changes to the fitted model:
```{r}
temp_file <- tempfile()
saveRDS(mod, temp_file)
mod2 <- readRDS(temp_file)
compare(mod, mod2, ignore_formula_env = TRUE)
```
Saving and reloading `mod2` didn't preserve XGBoost's reference to its `pointer`, which may result in failures later in the modeling process.
We thus need to prepare the fitted model to be saved before passing it to another R session. We can do so by bundling it:
```{r}
# bundle the model
bundled_mod <-
bundle(mod)
bundled_mod
```
Passing the model to another R session and generating predictions on new data:
```{r}
# load the model in a fresh R session and predict on new data
r(
func = function(bundled_mod) {
library(bundle)
library(parsnip)
unbundled_mod <-
unbundle(bundled_mod)
predict(unbundled_mod, new_data = mtcars[26:32,])
},
args = list(
bundled_mod = bundled_mod
)
)
```
For a more in-depth demonstration of the package, see the [main vignette](https://rstudio.github.io/bundle/articles/bundle.html) with `vignette("bundle")`.
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about our packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15&tags=question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/rstudio/bundle/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.