-
Notifications
You must be signed in to change notification settings - Fork 538
[BUGFIX]fix bug of top-p sampling #1503
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1503 +/- ##
==========================================
- Coverage 85.87% 83.31% -2.56%
==========================================
Files 52 55 +3
Lines 6909 7517 +608
==========================================
+ Hits 5933 6263 +330
- Misses 976 1254 +278
Continue to review full report at Codecov.
|
@@ -43,7 +44,7 @@ Some metrics for the unconditional generated text | |||
| topk=40 | 0.4291 | 0.9666 | 0.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think previously we have the results of t=0.9
, we should remove that row.
mx.np.zeros_like(probs) | ||
) | ||
# choose the borderline prob | ||
p_prob = mx.np.min(masked_probs, axis=2, keepdims=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use exactly the same implementation as https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the part in which they choose not to mask the top-1 probability:
sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. I see that both sort and argsort are implemented but I don't see a way to get both values and indices in one call. The usage of topk(k=-1)
that assumes the return values to be sorted seems to be undocumented, which is a bit of a concern.
@szha Would you help review? |
@szha Would you take a look? |
probs >= p_prob, | ||
probs, | ||
mx.np.zeros_like(probs) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The major difference between the current implementation and the original pytorch-based implementation is that when sampling_topp < max(probs)
, it is not clear which probability will be picked.
The pytorch-based implementation will always choose the token that is most probable.
Description
fix bug of top-p sampling mentioned in issue
Checklist
Essentials
cc @dmlc/gluon-nlp-team