Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential User History Features Not Utilized in Model Input Pipeline in train_amazon_inttower.py #13

Open
CHETAN1KUKREJA opened this issue Nov 25, 2024 · 4 comments

Comments

@CHETAN1KUKREJA
Copy link

Brief

@archersama
In the file IntTower/train_amazon_inttower.py in line 141 the user history is calculated but the feature is never used for making prediction why?

Current Behavior

The implementation processes user historical interaction data through get_user_feature() and get_var_feature(), generating train_user_hist, but this processed sequential data is not included in the final model input. Currently, only sparse and dense features are passed to the model:

train_model_input = {name: train[name] for name in sparse_features + dense_features}

Expected Behavior

Sequential user history should be incorporated into the model input to leverage historical user-item interactions for better recommendations. The train_user_hist generated from get_var_feature() should be included in the model's input features.

Impact

This oversight means the model is currently making predictions based only on:

  • Static user features (reviewerID, user_mean_rating)
  • Static item features (asin, categories, item_mean_rating, price)

It's missing the valuable sequential patterns in user behavior that have already been processed but aren't being utilized.

bug enhancement feature-engineering

@archersama
Copy link
Owner

We found that this feature could cause data leakage, so we removed this. As for the results below autoint, can you report both autoint and inttower results?

@CHETAN1KUKREJA
Copy link
Author

a data leak? how exactly?
and without that wouldn't the recommendations be based on just the average rating of the user and the categories, item_mean_rating, price of the item?
so that would be not that personalized,right?

@archersama
Copy link
Owner

"User history is important. However, constructing an appropriate user history is crucial. You might try creating an example of user history yourself."

@CHETAN1KUKREJA
Copy link
Author

Ok, I will try. Can you guide me on this topic, It will be much easier for me if you guide me on what to do, as I don't have to go through the entire code then.

Meanwhile, can you please tell me what the data leak was that you were talking about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants