More Efficient Refitting: modeltime_refit() retrains all models. Allow user to retrain only the best model. #14

jntrcs · 2021-03-15T18:47:31Z

On a training set, I used automl_reg() to train for half an hour to discover that the best fitting model was a stacked ensemble 'best of family' model.

My default expectations of using modeltime_refit() on this model would be that the function would dive into the stacked ensemble, retrain each sub-model using the existing hyperparameters, and return an updated stacked ensemble model. Instead, the refit function appears to be just rerunning the grid search across all possible models and selecting a new best-fitting model. This seems problematic to me for several reasons: (1) running my own, custom cross-validation scheme is going to take forever. Running a search through all possible models for 30 minutes instead of retraining 5 sub-models is a lot more computational power. (2) There's no consistency in outputs. Sometimes the best model chosen might be a very simple glmnet and other times it might be an ensemble of 5 deep learning models, 3 xgboosts, and a glmnet. It doesn't seem safe to put something like that into production if the model itself could vary that much between runs.

So my two questions are:

Is the current behavior (rerunning the entire grid search) the appropriate behavior for modeltime_refit()?
Is there a workaround to retrain the existing best found model on a new dataset?

The text was updated successfully, but these errors were encountered:

mdancho84 · 2021-03-15T19:48:08Z

This is a good question. Short answer:

I have a message into Erin LeDell with H2O. She heads the AutoML team.
Checkpointing should work, but we'd need to implement it (so thanks for letting me know about this issue)

Long answer:

I've read about checkpointing, which may be a solution. We'd need to implement it inside of Modeltime H2O, but in theory it should work. We'd just pass a checkpoint parameter through to the automl leader or the model id of your choosing.

Stacked models don't have the checkpoint feature (that I'm aware of). But you could in theory select to use a different model.

There is a leaderboard that is stored in the modeltime object.

We'd just grab the model from the leaderboard and refit it using a checkpoint model id.

We can then do like the checkpoint example and retrain the model using the previous checkpoint. Should save a boat load of time.

mdancho84 · 2021-03-16T19:05:26Z

H2O JIRA Tracking: https://h2oai.atlassian.net/browse/PUBDEV-8051

Per discussion with Erin LeDell at H2O.ai, she has opened an JIRA to resolve / make it easier to re-train. Checkpointing is not the ideal solution. Alternatives exist such as grabbing parameters from the various H2O objects, but H2O has suggested implementing an easier method for retraining.

Steviey · 2024-03-28T02:30:22Z

+1 Any news on this issue?

mdancho84 mentioned this issue Mar 15, 2021

Modeltime H2O Roadmap #1

Open

16 tasks

mdancho84 changed the title ~~modeltime_refit() behavior (as expected?)~~ More Efficient Refitting: modeltime_refit() retrains all models. Allow user to retrain only the best model. Mar 17, 2021

Steviey mentioned this issue Apr 1, 2024

Clarify what you can expect to do after bundling, i.e. predict rstudio/bundle#50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Efficient Refitting: modeltime_refit() retrains all models. Allow user to retrain only the best model. #14

More Efficient Refitting: modeltime_refit() retrains all models. Allow user to retrain only the best model. #14

jntrcs commented Mar 15, 2021

mdancho84 commented Mar 15, 2021 •

edited

Loading

mdancho84 commented Mar 16, 2021 •

edited

Loading

Steviey commented Mar 28, 2024 •

edited

Loading

More Efficient Refitting: modeltime_refit() retrains all models. Allow user to retrain only the best model. #14

More Efficient Refitting: modeltime_refit() retrains all models. Allow user to retrain only the best model. #14

Comments

jntrcs commented Mar 15, 2021

mdancho84 commented Mar 15, 2021 • edited Loading

mdancho84 commented Mar 16, 2021 • edited Loading

Steviey commented Mar 28, 2024 • edited Loading

mdancho84 commented Mar 15, 2021 •

edited

Loading

mdancho84 commented Mar 16, 2021 •

edited

Loading

Steviey commented Mar 28, 2024 •

edited

Loading