You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trainer.evaluate() is called, the model can see all the inputs, including the targets, whose embeddings influence the all latent embeddings. I believe, that targets should be truncated to simulate the production environment.
Steps/Code to reproduce bug
Take any model and any sequence from a dataset.
To evaluate the model in a production-like environment split the sequence into input, target = sequence[:-1], sequence[-1] and run pred = trainer.evaluate(input_dataset).predictions[0] (on input_dataset created from the input sequence) and compute recall_simulated = recall(target, pred).
Evaluate model recall_eval = trainer.evaluate(sequence)
The resultrecall_eval.recall is different from recall_simulated, which shouldn't be.
Expected behavior
recall_eval.recall should return the same recall as recall_simulated
Environment details
Transformers4Rec version: 23.6.0
Platform: Linux + Docker image (nvcr.io/nvidia/merlin/merlin-pytorch:23.06)
Python version: 3.8.10
Huggingface Transformers version: 4.12.0
PyTorch version (GPU?): torch==2.0.1, pytorch-lightning==2.0.4
Tensorflow version (GPU?): --
Additional context
Find attached masking.patch file, which fixed the result discrepancy for me.
The text was updated successfully, but these errors were encountered:
korchi
changed the title
Evaluation are incorrect because it can see the label. [BUG]
Evaluation is incorrect because it can see the label. [BUG]
Aug 15, 2023
@korchi if you are truncating the target you should use trainer.predict(), which is using n-1 inputs to predict nth item. we are not masking anything if we use .predict().
However, if you are using trainer.evaluate() , we are automatically masking the last item under the hood, so that we generate prediction result for the last item in the given input. So you dont need to truncate the input sequence if you use .evaluate().
Bug description
When
trainer.evaluate()
is called, the model can see all the inputs, including the targets, whose embeddings influence the all latent embeddings. I believe, thattargets
should betruncated
to simulate the production environment.Steps/Code to reproduce bug
model
and anysequence
from a dataset.sequence
intoinput, target = sequence[:-1], sequence[-1]
and runpred = trainer.evaluate(input_dataset).predictions[0]
(oninput_dataset
created from theinput
sequence) and computerecall_simulated = recall(target, pred)
.recall_eval = trainer.evaluate(sequence)
recall_eval.recall
is different fromrecall_simulated
, which shouldn't be.Expected behavior
recall_eval.recall
should return the same recall asrecall_simulated
Environment details
Additional context
Find attached masking.patch file, which fixed the result discrepancy for me.
The text was updated successfully, but these errors were encountered: