FLAML with mixed numerical and categorical features #1226
Unanswered
Sukantabasu
asked this question in
Q&A
Replies: 1 comment
-
automl does some simple preprocessing before invoking the trained estimator. That could be the reason. Could you try applying automl.feature_transformer to the data before using bestMod for prediction? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Everyone,
I have been using FLAML for a while. Thus far, all my datasets only included numerical features. Recently, I started with a new dataset that has both numerical and categorical features. I am using lightgbm and noticed an interesting behavior. Let us assume I have a dataframe df_X which has a few categorical columns; df_X.dtypes are: float64, float64, float64, float64, category, category, float64, etc.
After training, if I do the following, I get good results.
Y1 = automl.predict(df_X)
However, if I do the following, I get quite erroneous results.
bestMod = automl.best_model_for_estimator('lgbm')
Y2 = bestMod.predict(df_X)
Typically, with numerical data, I use multiple models 'lgbm', 'xgboost', 'rf', etc. Since I am using mixed features (without one-hot-encoding), I was just testing my code with only lgbm. My understanding is that lgbm handles categorical data very efficiently with integer encoding.
For my trial run, I was expecting Y1 and Y2 to be identical. Why there is a difference? I did not see any such differences for purely numerical dataframes.
Best regards,
Sukanta
Beta Was this translation helpful? Give feedback.
All reactions