We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Following this quick tutorial, I was hoping to use XGBoost for multilabel classification by passing label_column as a list within XGBoostTrainer. Is there any plan to support this functionality? https://xgboost.readthedocs.io/en/stable/tutorials/multioutput.html
import ray import pandas as pd import xgboost as xgb from ray.train.xgboost import XGBoostTrainer, XGBoostPredictor from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import train_test_split num_classes = 30 X, y = make_multilabel_classification( n_classes=num_classes, random_state=0, n_samples=1000 ) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) X_train = pd.DataFrame(X_train, columns=[f"x{i}" for i in range(X_train.shape[1])]) y_train = pd.DataFrame(y_train, columns=[f"y{i}" for i in range(y_train.shape[1])]) train_ds = ray.data.from_pandas(pd.concat([X_train, y_train], axis=1)) trainer = XGBoostTrainer( # label_column="y1", # works label_column=["y1", "y2"], # not supported params={ "tree_method": "hist", "max_depth": 15, "n_estimators": 50, }, num_boost_round=10, datasets={"train": train_ds}, ) result = trainer.fit()
The trace is:
Current time: 2022-11-08 20:11:24 (running for 00:00:02.47) Memory usage on this node: 21.7/62.8 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/16 CPUs, 0/0 GPUs, 0.0/25.74 GiB heap, 0.0/12.87 GiB objects Result logdir: /home/gcounihan/ray_results/XGBoostTrainer_2022-11-08_20-11-21 Number of trials: 1/1 (1 RUNNING) +----------------------------+----------+---------------------+ | Trial name | status | loc | |----------------------------+----------+---------------------| | XGBoostTrainer_8362b_00000 | RUNNING | 10.50.101.142:64403 | +----------------------------+----------+---------------------+ (XGBoostTrainer pid=64403) /home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/xgboost_ray/main.py:464: UserWarning: `num_actors` in `ray_params` is smaller than 2 (1). XGBoost will NOT be distributed! (XGBoostTrainer pid=64403) warnings.warn( (XGBoostTrainer pid=64403) 2022-11-08 20:11:24,790 ERROR function_trainable.py:298 -- Runner Thread raised error. (XGBoostTrainer pid=64403) Traceback (most recent call last): (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 289, in run (XGBoostTrainer pid=64403) self._entrypoint() (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 362, in entrypoint (XGBoostTrainer pid=64403) return self._trainable_func( (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span (XGBoostTrainer pid=64403) return method(self, *_args, **_kwargs) (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/train/base_trainer.py", line 460, in _trainable_func (XGBoostTrainer pid=64403) super()._trainable_func(self._merged_config, reporter, checkpoint_dir) (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 684, in _trainable_func (XGBoostTrainer pid=64403) output = fn() (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/train/base_trainer.py", line 375, in train_func (XGBoostTrainer pid=64403) trainer.training_loop() (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/train/gbdt_trainer.py", line 246, in training_loop (XGBoostTrainer pid=64403) model = self._train( (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/ray/train/xgboost/xgboost_trainer.py", line 77, in _train (XGBoostTrainer pid=64403) return xgboost_ray.train(**kwargs) (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/xgboost_ray/main.py", line 1482, in train (XGBoostTrainer pid=64403) bst, train_evals_result, train_additional_results = _train( (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/xgboost_ray/main.py", line 1041, in _train (XGBoostTrainer pid=64403) dtrain.assert_enough_shards_for_actors(num_actors=ray_params.num_actors) (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/xgboost_ray/matrix.py", line 788, in assert_enough_shards_for_actors (XGBoostTrainer pid=64403) self.loader.assert_enough_shards_for_actors(num_actors=num_actors) (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/xgboost_ray/matrix.py", line 486, in assert_enough_shards_for_actors (XGBoostTrainer pid=64403) data_source = self.get_data_source() (XGBoostTrainer pid=64403) File "/home/gcounihan/miniconda3/envs/ncf38/lib/python3.8/site-packages/xgboost_ray/matrix.py", line 448, in get_data_source (XGBoostTrainer pid=64403) raise ValueError( (XGBoostTrainer pid=64403) ValueError: Invalid `label` value for distributed datasets: ['y1', 'y2']. Only strings are supported. (XGBoostTrainer pid=64403) FIX THIS by passing a string indicating the label column of the dataset as the `label` argument.
The text was updated successfully, but these errors were encountered:
Thanks, will take a look at what it would take to support this!
Sorry, something went wrong.
Yard1
No branches or pull requests
Following this quick tutorial, I was hoping to use XGBoost for multilabel classification by passing label_column as a list within XGBoostTrainer. Is there any plan to support this functionality? https://xgboost.readthedocs.io/en/stable/tutorials/multioutput.html
The trace is:
The text was updated successfully, but these errors were encountered: