-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] remove 'categorical_feature' and 'feature_name' parameters in cv() and train() #6435
Comments
I was not aware that there parameters were even available, I'm very much in favor of the removal across language APIs. |
Very good idea. I have in mind that it is also in the |
Thanks for pointing that out @mayer79 Yes, the |
While working through this tonight to add deprecation warnings, I found a few more things in
Lines 103 to 109 in 6e78e69
Lines 79 to 80 in 6e78e69
LightGBM's I think restricting cc @mayer79 |
I like. Maybe worth noting that the one and only @david-cortes very recently removed non-dataset input in |
Oh nice, thanks @mayer79 ! I hadn't seen dmlc/xgboost#10031. |
I'm +1 for all the changes discussed in this issue. |
Thanks! I'll put up a PR removing these deprecated arguments. The deprecation warnings have now been part of 2 releases. |
…m.train(...)` function calls. (#454) Using `categorical_feature` parameter in `lightgbm.Dataset()` instead of `lightgbm.train(...)` eliminates the following warnings: ``` test/gbdt/test_gbdt.py: 60 warnings /usr/local/lib/python3.10/dist-packages/lightgbm/engine.py:187: LGBMDeprecationWarning: Argument 'categorical_feature' to train() is deprecated and will be removed in a future release. Set 'categorical_feature' when calling lightgbm.Dataset() instead. See microsoft/LightGBM#6435. _emit_dataset_kwarg_warning("train", "categorical_feature") ``` --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Proposal
I'm requesting comment on the following proposal:
categorical_feature
fromcv()
andtrain()
in the R and Python packagesfeature_name
fromcv()
andtrain()
in the Python packagecolnames
fromcv()
andtrain()
in the R packageAnd doing all of these only after the packages issuing deprecation warnings for 2-3 releases.
Summary
Both the R and Python packages expose functions
cv()
(for cross-validation) andtrain()
(for regular entire-dataset training). These functions require a LightGBMDataset
object.The
Dataset
object holds attributescategorical_features
andfeature_names
, and allows setting those via constructor keyword arguments andset_{attr}()
methods.Despite that, these
cv()
andtrain()
functions also takecategorical_features
andfeature_names
as keyword arguments.Python
cv()
LightGBM/python-package/lightgbm/engine.py
Lines 569 to 570 in 92a8741
Python
train()
LightGBM/python-package/lightgbm/engine.py
Lines 62 to 63 in 92a8741
R-package
cv()
LightGBM/R-package/R/lgb.cv.R
Lines 90 to 91 in 92a8741
R-package
train()
LightGBM/R-package/R/lgb.train.R
Lines 57 to 58 in 92a8741
These keyword arguments aren't providing any value, in my opinion. Their values are just forwarded along to calls like this:
LightGBM/python-package/lightgbm/engine.py
Lines 738 to 740 in 92a8741
Which at best is redundant with the
Dataset
class, and at worst could lead to runtime exceptions (if theDataset
has already been constructed).Motivation
Would simplify the library's interface without any loss of functionality.
If this proposal is accepted, the
Dataset
class would be the only place that this information is provided totrain()
andcv()
.References
Inspired by this post I noticed on Stack Overflow: https://stackoverflow.com/questions/78383840/in-lightgbm-why-do-the-train-and-the-cv-apis-accept-categorical-feature-argument/78405996#78405996
xgboost
does not expose such arguments intrain()
(code link) orcv()
(code link).These arguments have been part of the API since September 2017: ef77806#diff-9bd633ead0bdfe9540c42a618efd9e559cca16c522ad844a09fcf4ffc7d6e84c.
The text was updated successfully, but these errors were encountered: