Update torchtune generation to be more flexible #1970

RylanC24 · 2024-11-08T04:11:38Z

Summary:
The existing softmax sampling trick implementation in the torchtune generator is not flexible enough to deal with vocab pruned models (when the number of logits produced does not match the size of the embedding layer).

This is an unnecessary limitation and is easy to fix if we simply create the q tensor to match the size of the logits tensor instead of the embedding layer.

Differential Revision: D65480353

pytorch-bot · 2024-11-08T04:11:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1970

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 344e99f with merge base 7bfb333 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-08T04:11:50Z

This pull request was exported from Phabricator. Differential Revision: D65480353

Summary: The existing softmax sampling trick implementation in the torchtune generator is not flexible enough to deal with vocab pruned models (when the number of logits produced does not match the size of the embedding layer). This is an unnecessary limitation and is easy to fix if we simply create the `q` tensor to match the size of the logits tensor instead of the embedding layer. NOTE: this is just a draft diff to get feedback on possible changes to the OSS torchtune package before submitting a proper pull request Differential Revision: D65480353

facebook-github-bot · 2024-11-08T04:15:59Z

This pull request was exported from Phabricator. Differential Revision: D65480353

SalmanMohammadi · 2024-11-08T10:56:00Z

Hey @RylanC24! Thanks for opening this : )

It looks like the main change is to set the default path to sample q using

    probs = torch.nn.functional.softmax(logits, dim=-1)

    # if q is None, we use the default softmax sampling trick
    if q is None: # <---- q is now None by default
        q = torch.empty_like(probs).exponential_(1)

Is that right? If so that makes sense to me at a high level.

Out of curiousity, what's your use case here? Are you adding this change to use with the generate.py recipe? FWIW we'll eventually be deprecating this recipe (I think) to use the dev/generate_v2.py recipe which is significantly neater and uses this proposed behaviour by default since it calls sample directly without going through generate_next_token. I think this change makes sense to fix the existing generation utils, though.

cc @joecummings

RylanC24 · 2024-11-08T13:19:36Z

@SalmanMohammadi yes, that's right. The use-case is a subtle one but comes up anytime you want to trim the embedding and/or output layers to remove unnecessary tokens (e.g., if the output space is constrained and we don't want to keep 128k x 2048 dimensional vectors in our model). The issue comes up when you want to map this trimmed output space back to the original (so we can still use the same tokenizer). In this situation the dimension of the output logits will not match the dimension of the embedding layer, leading to an error when we try to divide the logits by q (which was previously set to the size of the embedding layer).

RylanC24 · 2024-11-08T13:21:30Z

@SalmanMohammadi forgot to add that yes, the new generator shouldn't have this issue but this fix will allow us to patch the old one in the meantime :-)

RdoubleA · 2024-11-08T14:17:13Z

torchtune/generation/_generation.py

@@ -67,7 +67,7 @@ def generate_next_token(
    model: TransformerDecoder,
    input_pos: torch.Tensor,
    x: torch.Tensor,
-    q: torch.Tensor,
+    q: Optional[torch.Tensor] = None,


Need to update the docstring to reflect the new typing, lint is failing

SalmanMohammadi · 2024-11-08T14:20:19Z

@SalmanMohammadi yes, that's right. The use-case is a subtle one but comes up anytime you want to trim the embedding and/or output layers to remove unnecessary tokens (e.g., if the output space is constrained and we don't want to keep 128k x 2048 dimensional vectors in our model). The issue comes up when you want to map this trimmed output space back to the original (so we can still use the same tokenizer). In this situation the dimension of the output logits will not match the dimension of the embedding layer, leading to an error when we try to divide the logits by q (which was previously set to the size of the embedding layer).

Thanks! I have a couple points:

This fix won't actually work for when we have rng right? I'm not sure I see an immediate neat solution here though, is there a way to infer the size of the output space here? rng is just used for PPO so it'd be a very rare interaction.
How annoying would it be to add a test for this? We have some tests in tests/torchtune/generation/test_generation.py which build some dummy models. Would it be simple enough to create another dummy model fixture which has the embedding replaced with a trimmed embedding, and ensures that we can correctly generate without any issues?

codecov-commenter · 2024-11-08T14:39:58Z

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 25.18%. Comparing base (9eced21) to head (6cae056).
Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/generation/_generation.py	0.00%	6 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1970       +/-   ##
===========================================
- Coverage   68.40%   25.18%   -43.22%     
===========================================
  Files         311      311               
  Lines       16973    17038       +65     
===========================================
- Hits        11610     4291     -7319     
- Misses       5363    12747     +7384

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RylanC24 · 2024-11-08T14:42:49Z

Thanks! I have a couple points:

This fix won't actually work for when we have rng right? I'm not sure I see an immediate neat solution here though, is there >a way to infer the size of the output space here? rng is just used for PPO so it'd be a very rare interaction.

Yes, it won't work when an rng is used but I figured these were both pretty niche use-cases that are unlikely to clash. Since there's already a plan to migrate to the new generator where this won't be an issue I think the risk is pretty minimal to ignore this very corner use-case for the time being. wdyt?

How annoying would it be to add a test for this? We have some tests in tests/torchtune/generation/test_generation.py >which build some dummy models. Would it be simple enough to create another dummy model fixture which has the >embedding replaced with a trimmed embedding, and ensures that we can correctly generate without any issues?

This is doable but would be a bit annoying since the vocab pruned model types are not defined in the torchtune repo. The existing tests should validate that the normal generation use-cases are not affected and I've verified with our vocab pruned model definitions that it works as expected. Again, since this is really just a stopgap fix until the new generator is released maybe we can forgo the additional tests?

Summary: The existing softmax sampling trick implementation in the torchtune generator is not flexible enough to deal with vocab pruned models (when the number of logits produced does not match the size of the embedding layer). This is an unnecessary limitation and is easy to fix if we simply create the `q` tensor to match the size of the logits tensor instead of the embedding layer. Differential Revision: D65480353

facebook-github-bot · 2024-11-08T14:46:02Z

This pull request was exported from Phabricator. Differential Revision: D65480353

SalmanMohammadi · 2024-11-08T15:02:47Z

Yeah makes sense to me. I'll verify it works OK with compile in a follow up :)

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 8, 2024

facebook-github-bot added the fb-exported label Nov 8, 2024

RylanC24 force-pushed the export-D65480353 branch from d42f319 to 6cae056 Compare November 8, 2024 04:15

RdoubleA requested a review from joecummings November 8, 2024 14:00

RdoubleA reviewed Nov 8, 2024

View reviewed changes

RylanC24 force-pushed the export-D65480353 branch from 6cae056 to 344e99f Compare November 8, 2024 14:45

SalmanMohammadi approved these changes Nov 8, 2024

View reviewed changes

facebook-github-bot merged commit eb67cc5 into pytorch:main Nov 8, 2024
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update torchtune generation to be more flexible #1970

Update torchtune generation to be more flexible #1970

RylanC24 commented Nov 8, 2024 •

edited

Loading

pytorch-bot bot commented Nov 8, 2024 •

edited

Loading

facebook-github-bot commented Nov 8, 2024

facebook-github-bot commented Nov 8, 2024

SalmanMohammadi commented Nov 8, 2024

RylanC24 commented Nov 8, 2024

RylanC24 commented Nov 8, 2024

RdoubleA Nov 8, 2024

SalmanMohammadi commented Nov 8, 2024 •

edited

Loading

codecov-commenter commented Nov 8, 2024

RylanC24 commented Nov 8, 2024

facebook-github-bot commented Nov 8, 2024

SalmanMohammadi commented Nov 8, 2024

Update torchtune generation to be more flexible #1970

Update torchtune generation to be more flexible #1970

Conversation

RylanC24 commented Nov 8, 2024 • edited Loading

pytorch-bot bot commented Nov 8, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1970

✅ No Failures

facebook-github-bot commented Nov 8, 2024

facebook-github-bot commented Nov 8, 2024

SalmanMohammadi commented Nov 8, 2024

RylanC24 commented Nov 8, 2024

RylanC24 commented Nov 8, 2024

RdoubleA Nov 8, 2024

Choose a reason for hiding this comment

SalmanMohammadi commented Nov 8, 2024 • edited Loading

codecov-commenter commented Nov 8, 2024

Codecov Report

RylanC24 commented Nov 8, 2024

facebook-github-bot commented Nov 8, 2024

SalmanMohammadi commented Nov 8, 2024

RylanC24 commented Nov 8, 2024 •

edited

Loading

pytorch-bot bot commented Nov 8, 2024 •

edited

Loading

SalmanMohammadi commented Nov 8, 2024 •

edited

Loading