Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement model ensembling features #174

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

RAMitchell
Copy link
Contributor

Implements #167 by adding multiplcation and addition operators for our estimators.

Additional is carefully defined for every member variable in the estimator.

@RAMitchell RAMitchell marked this pull request as ready for review November 11, 2024 10:22
@seberg
Copy link
Contributor

seberg commented Nov 11, 2024

Looking through it a bit, it seems right and cute to me (but I should understand the actual prediction mixing a bit better).

But I also have to wonder a bit if this is actually all that useful compared to VotingClassifier and VotingEstimator in sklearn? It seems a bit like a really cool idea that may not be quite as useful in practice.

OTOH, I am not really worried about it even if I am not sure it is the best pattern: I don't think + or * on estimators can be used for anything else useful.

@RAMitchell
Copy link
Contributor Author

Thanks for your quick review!

The voting classifier gives a different result. The method described here combines the result in logit space before probability transformation. It is not available in sklearn.

I have personally used these features before by hacking into xgboost e.g. for this paper:
https://arxiv.org/abs/2005.07353

@RAMitchell
Copy link
Contributor Author

Its also another method of out-of-core batch training different from the example in https://github.com/rapidsai/legate-boost/tree/main/examples/batch_training. I think it has potential in memory constrained situations.

Its definitely up for debate if my +/* operator overloading API is sensible.

@seberg
Copy link
Contributor

seberg commented Nov 11, 2024

It isn't the same as voting="soft", even if I might agree that having a default of voting="hard" may not be the best choice (maybe should just think about it more)?

In either case, I think my main worry is that the +/* syntax may run into oddities. Maybe just things like model_a * 0.3 + model_b * 0.8 maybe not being flagged (factors don't sum to 1)?


def __mul__(self, scalar: Any) -> "Tree":
new = copy.deepcopy(self)
new.leaf_value *= scalar
Copy link
Member

@trivialfis trivialfis Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does addition and multiplication make sense for all of the objectives? I can see it does for linear objectives, but not sure about it for other types.

update:
Yeah, makes sense. See the next comment.

Copy link
Member

@trivialfis trivialfis Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, assuming the datasets for two models are the same or follow the same distribution, is this the same as changing the learning rate during training?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it makes sense for all objectives. The leaf output is always additive until some (optional) non-linear transformation gets added at the end of prediction.

The multiplication by a scalar is similar to learning rate, in the case of training two models independently with learning rate 0.5 and adding them, they will not be the same as one model trained with learning rate 1.0. At each step the gradients are calculated without access to the other models predictions.

@RAMitchell
Copy link
Contributor Author

@seberg I don't think the scalar being greater than 1 is a problem - it would be a legitimate thing to do, although I don't know why you would do it. Gradient boosted models are linearly additive and this is just expressing that.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, a very interesting feature!

Some notes:

  • It might not be necessary to have the same model types. Some people stack models.
  • The addition is not associative, which is weird from an algebra standpoint, but it's fine as an interface for ML models.

@RAMitchell
Copy link
Contributor Author

@trivialfis In which way is it not associative?

@trivialfis
Copy link
Member

Sorry, commutative

@RAMitchell
Copy link
Contributor Author

Isn't it also commutative. Do you have an example where its not?

@trivialfis
Copy link
Member

trivialfis commented Nov 15, 2024

If you have a prefer A option, then the ordering of the addition matters right?

@RAMitchell
Copy link
Contributor Author

Ah yes for the parameters. The models raw prediction output is associative and commutiative though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants