Implement model ensembling features #174

RAMitchell · 2024-10-30T11:34:22Z

Implements #167 by adding multiplcation and addition operators for our estimators.

Additional is carefully defined for every member variable in the estimator.

seberg · 2024-11-11T12:26:22Z

Looking through it a bit, it seems right and cute to me (but I should understand the actual prediction mixing a bit better).

But I also have to wonder a bit if this is actually all that useful compared to VotingClassifier and VotingEstimator in sklearn? It seems a bit like a really cool idea that may not be quite as useful in practice.

OTOH, I am not really worried about it even if I am not sure it is the best pattern: I don't think + or * on estimators can be used for anything else useful.

RAMitchell · 2024-11-11T12:34:04Z

Thanks for your quick review!

The voting classifier gives a different result. The method described here combines the result in logit space before probability transformation. It is not available in sklearn.

I have personally used these features before by hacking into xgboost e.g. for this paper:
https://arxiv.org/abs/2005.07353

RAMitchell · 2024-11-11T12:38:41Z

Its also another method of out-of-core batch training different from the example in https://github.com/rapidsai/legate-boost/tree/main/examples/batch_training. I think it has potential in memory constrained situations.

Its definitely up for debate if my +/* operator overloading API is sensible.

seberg · 2024-11-11T13:00:15Z

It isn't the same as voting="soft", even if I might agree that having a default of voting="hard" may not be the best choice (maybe should just think about it more)?

In either case, I think my main worry is that the +/* syntax may run into oddities. Maybe just things like model_a * 0.3 + model_b * 0.8 maybe not being flagged (factors don't sum to 1)?

trivialfis · 2024-11-11T16:15:19Z

legateboost/models/tree.py

+
+    def __mul__(self, scalar: Any) -> "Tree":
+        new = copy.deepcopy(self)
+        new.leaf_value *= scalar


Does addition and multiplication make sense for all of the objectives? I can see it does for linear objectives, but not sure about it for other types.

update:
Yeah, makes sense. See the next comment.

In a way, assuming the datasets for two models are the same or follow the same distribution, is this the same as changing the learning rate during training?

Yes it makes sense for all objectives. The leaf output is always additive until some (optional) non-linear transformation gets added at the end of prediction.

The multiplication by a scalar is similar to learning rate, in the case of training two models independently with learning rate 0.5 and adding them, they will not be the same as one model trained with learning rate 1.0. At each step the gradients are calculated without access to the other models predictions.

legateboost/legateboost.py

RAMitchell · 2024-11-13T15:59:25Z

@seberg I don't think the scalar being greater than 1 is a problem - it would be a legitimate thing to do, although I don't know why you would do it. Gradient boosted models are linearly additive and this is just expressing that.

trivialfis

Looks good to me, a very interesting feature!

Some notes:

It might not be necessary to have the same model types. Some people stack models.
The addition is not associative, which is weird from an algebra standpoint, but it's fine as an interface for ML models.

RAMitchell · 2024-11-14T11:42:29Z

@trivialfis In which way is it not associative?

trivialfis · 2024-11-14T16:26:26Z

Sorry, commutative

RAMitchell · 2024-11-15T09:10:42Z

Isn't it also commutative. Do you have an example where its not?

trivialfis · 2024-11-15T14:16:50Z

If you have a prefer A option, then the ordering of the addition matters right?

RAMitchell · 2024-11-15T14:37:13Z

Ah yes for the parameters. The models raw prediction output is associative and commutiative though.

RAMitchell added 10 commits October 30, 2024 04:30

Implement model ensembling features

3c1190a

Merge branch 'main' of github.com:rapidsai/legateboost into ensemble

20a56ee

Update docs

08dba47

Merge branch 'main' of github.com:rapidsai/legateboost into ensemble

9fedcd6

Feature installation isntructions more prominently.

4bd79f3

Add an example

d124eb2

Add more docs

10ee5e8

Add typing annotations

b011d57

Merge branch 'main' of github.com:rapidsai/legateboost into ensemble

4cb92d1

Update readme

79735e3

RAMitchell marked this pull request as ready for review November 11, 2024 10:22

RAMitchell requested review from seberg and trivialfis November 11, 2024 10:22

trivialfis reviewed Nov 11, 2024

View reviewed changes

legateboost/legateboost.py Outdated Show resolved Hide resolved

RAMitchell added 2 commits November 13, 2024 03:44

Add n_estimators

8c5e216

Merge branch 'main' of github.com:rapidsai/legateboost into ensemble

2dd3e4c

trivialfis approved these changes Nov 14, 2024

View reviewed changes

Merge branch 'main' of github.com:rapidsai/legateboost into ensemble

4c0823d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement model ensembling features #174

Implement model ensembling features #174

RAMitchell commented Oct 30, 2024

seberg commented Nov 11, 2024

RAMitchell commented Nov 11, 2024

RAMitchell commented Nov 11, 2024

seberg commented Nov 11, 2024

trivialfis Nov 11, 2024 •

edited

Loading

trivialfis Nov 11, 2024 •

edited

Loading

RAMitchell Nov 11, 2024

RAMitchell commented Nov 13, 2024

trivialfis left a comment •

edited

Loading

RAMitchell commented Nov 14, 2024

trivialfis commented Nov 14, 2024

RAMitchell commented Nov 15, 2024

trivialfis commented Nov 15, 2024 •

edited

Loading

RAMitchell commented Nov 15, 2024

Implement model ensembling features #174

Are you sure you want to change the base?

Implement model ensembling features #174

Conversation

RAMitchell commented Oct 30, 2024

seberg commented Nov 11, 2024

RAMitchell commented Nov 11, 2024

RAMitchell commented Nov 11, 2024

seberg commented Nov 11, 2024

trivialfis Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

trivialfis Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

RAMitchell Nov 11, 2024

Choose a reason for hiding this comment

RAMitchell commented Nov 13, 2024

trivialfis left a comment • edited Loading

Choose a reason for hiding this comment

RAMitchell commented Nov 14, 2024

trivialfis commented Nov 14, 2024

RAMitchell commented Nov 15, 2024

trivialfis commented Nov 15, 2024 • edited Loading

RAMitchell commented Nov 15, 2024

trivialfis Nov 11, 2024 •

edited

Loading

trivialfis Nov 11, 2024 •

edited

Loading

trivialfis left a comment •

edited

Loading

trivialfis commented Nov 15, 2024 •

edited

Loading