Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees #6630

ToddMeng · 2024-08-29T10:26:21Z

Summary

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees

Motivation

Linear trees represent a practical technique that not only enhances model performance and simplifies model structure but also improves model interpretability. When working with linear models, users often need to impose numerous custom constraints to enhance interpretability and incorporate additional prior knowledge. These constraints may include restricting all regression coefficients to be positive, defining the monotonicity of each variable, and limiting the linear regression to a subset of selected features.

Description

As a regular user of this library, I am deeply grateful for the diligent efforts of all developers and maintainers, whose hard work has greatly facilitated our work.
Upon a thorough review of the documentation and the linear_tree_learner.cpp code (link: https://github.com/microsoft/LightGBM/blob/master/src/treelearner/linear_tree_learner.cpp), I have observed that, apart from the ridge regression parameters, the linear model component lacks support for other features, such as the aforementioned constraints on the signs of regression coefficients and the capability to include only a subset of features in the linear regression.

References

It is proposed that the functionality extensions of linear models in sklearn could be referenced, or an interface could be provided to enable users to customize linear models, thereby enhancing the flexibility and practicality of linear tree models.

jaguerrerod · 2024-09-06T07:45:01Z

Related to this, I think adding the option to include some predictors in all linear models, in addition to the predictors used in the splits to reach the leaf, is important.
I have datasets containing data from several population segments, and I am not interested in including the variables that define the segments in the model itself. However, I would like to include an adjustment in the prediction using the segment flags in the linear model fitted to each leaf.
My leaves have more than 20K observations, so including this segment adjustment does not pose an overfitting problem.
This option could be set through a parameter, 'features_forced_to_leaf_linear_model', as an array of feature indices or feature names.
I think this wouldn't be complex to implement, but I don't have the necessary C++ skills to do it.

jameslamb added the feature request label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees #6630

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees #6630

ToddMeng commented Aug 29, 2024

jaguerrerod commented Sep 6, 2024 •

edited

Loading

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees #6630

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees #6630

Comments

ToddMeng commented Aug 29, 2024

Summary

Motivation

Description

References

jaguerrerod commented Sep 6, 2024 • edited Loading

jaguerrerod commented Sep 6, 2024 •

edited

Loading