Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation is incorrect because it can see the label. [BUG] #738

Closed
korchi opened this issue Aug 8, 2023 · 1 comment
Closed

Evaluation is incorrect because it can see the label. [BUG] #738

korchi opened this issue Aug 8, 2023 · 1 comment
Labels
bug Something isn't working status/needs-triage

Comments

@korchi
Copy link

korchi commented Aug 8, 2023

Bug description

When trainer.evaluate() is called, the model can see all the inputs, including the targets, whose embeddings influence the all latent embeddings. I believe, that targets should be truncated to simulate the production environment.

Steps/Code to reproduce bug

  1. Take any model and any sequence from a dataset.
  2. To evaluate the model in a production-like environment split the sequence into input, target = sequence[:-1], sequence[-1] and run pred = trainer.evaluate(input_dataset).predictions[0] (on input_dataset created from the input sequence) and compute recall_simulated = recall(target, pred).
  3. Evaluate model recall_eval = trainer.evaluate(sequence)
  4. The resultrecall_eval.recall is different from recall_simulated, which shouldn't be.

Expected behavior

recall_eval.recall should return the same recall as recall_simulated

Environment details

  • Transformers4Rec version: 23.6.0
  • Platform: Linux + Docker image (nvcr.io/nvidia/merlin/merlin-pytorch:23.06)
  • Python version: 3.8.10
  • Huggingface Transformers version: 4.12.0
  • PyTorch version (GPU?): torch==2.0.1, pytorch-lightning==2.0.4
  • Tensorflow version (GPU?): --

Additional context

Find attached masking.patch file, which fixed the result discrepancy for me.

@korchi korchi added bug Something isn't working status/needs-triage labels Aug 8, 2023
@korchi korchi changed the title Evaluation are incorrect because it can see the label. [BUG] Evaluation is incorrect because it can see the label. [BUG] Aug 15, 2023
@rnyak
Copy link
Contributor

rnyak commented Aug 29, 2023

@korchi if you are truncating the target you should use trainer.predict(), which is using n-1 inputs to predict nth item. we are not masking anything if we use .predict().

However, if you are using trainer.evaluate() , we are automatically masking the last item under the hood, so that we generate prediction result for the last item in the given input. So you dont need to truncate the input sequence if you use .evaluate().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/needs-triage
Projects
None yet
Development

No branches or pull requests

3 participants