-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify what you can expect to do after bundling, i.e. predict
#50
Comments
That's true, yep! The focus of bundle is to capture the references needed by a model to make predictions in a new environment. For more info, you can look at: I would generally expect functions like |
Thank you for the helpful reply, I suspected this was the case and the links you shared made it much clearer. Unfortunately, although the scope of the |
Thank you so much for the kind words! ❤️ Let's keep this issue open and clarify some of the documentation about what you can expect to do after bundling, especially in the README and main vignette. (As a side note, I also maintain butcher and this is about the same as how butcher works. Sometimes we keep components in butcher that are needed for something like |
predict
Can we use pkg: bundle to:
refering to: https://rstudio.github.io/bundle/https://rstudio.github.io/bundle/articles/bundle.htmlhttps://rstudio.github.io/bundle/reference/bundle_h2o.html |
@Steviey The normal usage that we expect after bundling is to predict with your model, but if can get out the parsnip object, you should be able to refit: library(bundle)
library(parsnip)
library(callr)
## bundle a model
mod <-
boost_tree(trees = 5, mtry = 3) %>%
set_mode("regression") %>%
set_engine("xgboost") %>%
fit(mpg ~ ., data = mtcars[1:25,])
bundled_mod <- bundle(mod)
## fit the model to new data
r(
func = function(bundled_mod) {
library(bundle)
library(parsnip)
unbundled_mod <- unbundle(bundled_mod)
fittable_model <- extract_spec_parsnip(unbundled_mod)
fittable_model |> fit(mpg ~ ., data = mtcars[26:32,])
},
args = list(
bundled_mod = bundled_mod
)
)
#> parsnip model object
#>
#> ##### xgb.Booster
#> Handle is invalid! Suggest using xgb.Booster.complete
#> raw: 7.7 Kb
#> call:
#> xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0,
#> colsample_bytree = 1, colsample_bynode = 0.3, min_child_weight = 1,
#> subsample = 1), data = x$data, nrounds = 5, watchlist = x$watchlist,
#> verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#> eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "0.3", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> callbacks:
#> cb.evaluation.log()
#> # of features: 10
#> niter: 5
#> nfeatures : 10
#> evaluation_log:
#> iter training_rmse
#> <num> <num>
#> 1 16.923941
#> 2 12.953166
#> 3 10.022720
#> 4 7.801856
#> 5 6.089100 Created on 2024-03-29 with reprex v2.1.0 |
@juliasilge Thank you Julia. Extract_spec_parsnip() returns a parsnip model specification. Does this include hyperparameters from earlier trainings and fits before bundleling? Would this include sub models from the leaderboard of a h2o AutoML-model? |
@Steviey Hmmmm, I am not entirely sure as I don't have a ton of experience with H2O. I think a good venue for this kind of question is the agua repo: https://github.com/tidymodels/agua |
@juliasilge Thank you Julia for the response. Since the h2o-issue goes deeper to h2o itself, mentioned for example here: business-science/modeltime.h2o#14 More in general related to tidymodels (other models then h2o): This could be an ecological question too (green ML/AI). Maybe related: If bundle requires separat actions in this regard, I m not sure if this is still best practice:
|
@Steviey The bundle package can handle bundling up the needed references but doesn't have functionality for getting the best hyperparameters; you'd need to get that through tidymodels infrastructure in either tune or agua. Once you have those hyperparameters, then definitely bundle will work. 👍 |
@juliasilge OK, then I would bet on finalize more then on update. |
Feels worth mentioning that the Value documentation for each bundle method states:
I would argue that this is sufficient to set expectations for what users can do with unbundled objects. :) |
That's a great point @simonpcouch. 👍 We haven't heard a lot of other confusion on this point to date, so let's close this as complete. We can revisit in the future as necessary! |
I am not sure if this a known issue, as it doesn't appear in the docs. It seems that except
predict
, other methods liketidy
orrank_results
fail using the unbundled object.This SO post references the same problem.
The text was updated successfully, but these errors were encountered: