-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option early_stopping_rounds missing for LightGBM in ShapRFECV #229
Comments
Hi @detrin I believe that is what the subclass: EarlyStoppingShapRFECV tries to do. Does that solve your question? |
@ReinierKoops I didn't know there is subclass for that, I will try it |
Maybe a cleaner design would be to have one class To your suggestion, yes it is working! So, I guess this issue could be closed, however it doesn't work I would expect it to work. See here is my example import lightgbm
from sklearn.model_selection import RandomizedSearchCV
from probatus.feature_elimination import ShapRFECV, EarlyStoppingShapRFECV
params = {
# ...
"n_estimators": 1000,
"seed": 1234,
}
param_grid = {
"num_leaves": [25, 50, 100, 150, 200],
}
clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = EarlyStoppingShapRFECV(
clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)
report = shap_elimination.fit_compute(
data[train_mask][cols],
data[train_mask][col_target],
) So, when using shap_elimination = EarlyStoppingShapRFECV(clf=clf, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6) however shap_elimination = EarlyStoppingShapRFECV(clf=search, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6) and I think there is no reason for it not to be possible. You may say that such workflow is overkill, but I would say not really when you use a lot of features to begin with. Hyperoptimization is then needed so that the params are not bending the results of shapley values. I am thinking what would be the cleanest solution would be and I have the following When having So something like following clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = ShapRFECV(
clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)
fit_params = {
"fit_params": {
"fit_params": {
"eval_set": [(data[valid_mask][cols], data[valid_mask][col_target])],
} # for LGBMClassifier
} # for RandomizedSearchCV
}
report = shap_elimination.fit(
data[train_mask][cols],
data[train_mask][col_target],
fit_params=fit_params
) I see something similar done in
So what is the reason to have special class EarlyStoppingShapRFECV and what do you think of this proposal?
|
@ReinierKoops Could you have a look at this please? |
I think what you are proposing is an improvement for sure. However this would be a breaking change so I’d have to discuss this. Thank you for showing much interest :) |
I’ll get back to you on it on monday |
Thanks. Regarding breaking change, there could be an optional argument in |
If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning) |
Sure, this could be a fun, plus it will have real world use :) |
@detrin is this still something you'd consider picking up? |
@ReinierKoops Meanwhile I decided to persuade career opportunity in SImilarweb instead of Home Credit, so there is no need for this particular PR at my work. I might still use it outside the work and besides that it will be good practice. I will try to allocate some time over the next two weeks for this PR. |
I ran the function shap_elimination.fit_compute(X, y), but the error occurs on line 493 of feature_elimination.py. What should I do if I can't do it even though I fit X and y in data frames and series formats, respectively? |
Problem Description
When using LightgGBM most of the time I use
early_stopping_rounds
, because it gives some additional performance boost. I think it would be desired to have this option also inShapRFECV
.Looking at https://github.com/ing-bank/probatus/blob/main/probatus/feature_elimination/feature_elimination.py#L22 I don't see it here in any ow
fit()
method used.Solution Outline
I propose having
fit_kwargs
inShapRFECV.__init__()
, that will be passed toclf.fit()
when called (in my caseLGBMClassifier
).I would be interested in making PR. However, I am not sure in how many days I will have time for that. I hope soon enough.
The text was updated successfully, but these errors were encountered: