-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement model ensembling features #174
base: main
Are you sure you want to change the base?
Conversation
Looking through it a bit, it seems right and cute to me (but I should understand the actual prediction mixing a bit better). But I also have to wonder a bit if this is actually all that useful compared to OTOH, I am not really worried about it even if I am not sure it is the best pattern: I don't think |
Thanks for your quick review! The voting classifier gives a different result. The method described here combines the result in logit space before probability transformation. It is not available in sklearn. I have personally used these features before by hacking into xgboost e.g. for this paper: |
Its also another method of out-of-core batch training different from the example in https://github.com/rapidsai/legate-boost/tree/main/examples/batch_training. I think it has potential in memory constrained situations. Its definitely up for debate if my +/* operator overloading API is sensible. |
It isn't the same as In either case, I think my main worry is that the |
|
||
def __mul__(self, scalar: Any) -> "Tree": | ||
new = copy.deepcopy(self) | ||
new.leaf_value *= scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does addition and multiplication make sense for all of the objectives? I can see it does for linear objectives, but not sure about it for other types.
update:
Yeah, makes sense. See the next comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a way, assuming the datasets for two models are the same or follow the same distribution, is this the same as changing the learning rate during training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it makes sense for all objectives. The leaf output is always additive until some (optional) non-linear transformation gets added at the end of prediction.
The multiplication by a scalar is similar to learning rate, in the case of training two models independently with learning rate 0.5 and adding them, they will not be the same as one model trained with learning rate 1.0. At each step the gradients are calculated without access to the other models predictions.
@seberg I don't think the scalar being greater than 1 is a problem - it would be a legitimate thing to do, although I don't know why you would do it. Gradient boosted models are linearly additive and this is just expressing that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, a very interesting feature!
Some notes:
- It might not be necessary to have the same model types. Some people stack models.
- The addition is not associative, which is weird from an algebra standpoint, but it's fine as an interface for ML models.
@trivialfis In which way is it not associative? |
Sorry, commutative |
Isn't it also commutative. Do you have an example where its not? |
If you have a prefer A option, then the ordering of the addition matters right? |
Ah yes for the parameters. The models raw prediction output is associative and commutiative though. |
Implements #167 by adding multiplcation and addition operators for our estimators.
Additional is carefully defined for every member variable in the estimator.