Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[BUGFIX]fix bug of top-p sampling #1503

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

hutao965
Copy link
Contributor

Description

fix bug of top-p sampling mentioned in issue

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

cc @dmlc/gluon-nlp-team

@hutao965 hutao965 requested a review from a team as a code owner January 26, 2021 02:32
@codecov
Copy link

codecov bot commented Jan 26, 2021

Codecov Report

Merging #1503 (f20c838) into master (16146c2) will decrease coverage by 2.55%.
The diff coverage is 93.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1503      +/-   ##
==========================================
- Coverage   85.87%   83.31%   -2.56%     
==========================================
  Files          52       55       +3     
  Lines        6909     7517     +608     
==========================================
+ Hits         5933     6263     +330     
- Misses        976     1254     +278     
Impacted Files Coverage Δ
setup.py 0.00% <ø> (ø)
src/gluonnlp/op.py 95.16% <ø> (ø)
src/gluonnlp/utils/testing.py 58.33% <ø> (-35.84%) ⬇️
src/gluonnlp/models/transformer.py 98.52% <57.14%> (-0.41%) ⬇️
src/gluonnlp/layers.py 86.72% <83.87%> (-0.40%) ⬇️
src/gluonnlp/utils/tvm_utils.py 92.30% <92.30%> (ø)
src/gluonnlp/models/t5.py 93.67% <93.67%> (ø)
src/gluonnlp/models/mt5.py 95.49% <95.49%> (ø)
src/gluonnlp/attention_cell.py 88.39% <100.00%> (+1.47%) ⬆️
src/gluonnlp/models/__init__.py 100.00% <100.00%> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16146c2...f766c01. Read the comment docs.

@@ -43,7 +44,7 @@ Some metrics for the unconditional generated text
| topk=40 | 0.4291 | 0.9666 | 0.0 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think previously we have the results of t=0.9, we should remove that row.

mx.np.zeros_like(probs)
)
# choose the borderline prob
p_prob = mx.np.min(masked_probs, axis=2, keepdims=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use exactly the same implementation as https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the part in which they choose not to mask the top-1 probability:

sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. I see that both sort and argsort are implemented but I don't see a way to get both values and indices in one call. The usage of topk(k=-1) that assumes the return values to be sorted seems to be undocumented, which is a bit of a concern.

@sxjscience
Copy link
Member

@szha Would you help review?

@sxjscience
Copy link
Member

@szha Would you take a look?

probs >= p_prob,
probs,
mx.np.zeros_like(probs)
)
Copy link
Member

@sxjscience sxjscience Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major difference between the current implementation and the original pytorch-based implementation is that when sampling_topp < max(probs), it is not clear which probability will be picked.

The pytorch-based implementation will always choose the token that is most probable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants