Manually select features for each iteration #6659

pplonski · 2024-09-30T16:15:23Z

Summary

I would like to use different subset of features in each iteration. I'm aware that there is colsample_bytree but instead random features selection I would like to write custom Python code to select features.

Motivation

I hope that my LightGBM will get better generalization when trained on custom subsets of features.

Description

I was trying to use Booster class, with the following code, but it crashed:

import numpy as np
import lightgbm as lgb

data = np.random.rand(1000, 101)

params = {"n_estimators": 100,
          "learning_rate": 0.01,
          "max_depth": 5,
          "num_leaves": 2 ** 5,
          "colsample_bytree": 1.0
         }

train_set = lgb.Dataset(data[:, :100], label=data[:, 100], params=params)

booster = lgb.Booster(params=params, train_set=train_set)

for i in range(50):
    print(i)
    # custom select columns
    train_set = lgb.Dataset(data[:, :(i+1)], label=data[:, 100], params=params)
    booster.update(train_set)

I got output:

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.199411 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 25500
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 100
0
/home/piotr/sandbox/numerai_v4/v4/lib/python3.8/site-packages/lightgbm/basic.py:506: UserWarning: Usage of np.ndarray subset (sliced data) is not recommended due to it will double the peak memory cost in LightGBM.
  _log_warning("Usage of np.ndarray subset (sliced data) is not recommended "
[LightGBM] [Info] Start training from score 0.505111
Segmentation fault (core dumped)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually select features for each iteration #6659

Manually select features for each iteration #6659

pplonski commented Sep 30, 2024

Manually select features for each iteration #6659

Manually select features for each iteration #6659

Comments

pplonski commented Sep 30, 2024

Summary

Motivation

Description