-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make tokenize tests readable #1868
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1868
Note: Links to docs will display an error until the docs builds have been completed. ❗ 2 Active SEVsThere are 2 currently active SEVs. If your PR is affected, please view them below:
✅ No FailuresAs of commit cc60b25 with merge base d5c54f3 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc: @RdoubleA @joecummings What do you think? With current lint formatting working with this tests is really awful. Pretty minor fix |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1868 +/- ##
===========================================
- Coverage 67.27% 24.77% -42.50%
===========================================
Files 318 318
Lines 17648 17633 -15
===========================================
- Hits 11873 4369 -7504
- Misses 5775 13264 +7489 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Lint CI at this point should be changed, if not the formating will be still really bad in case of expected_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I will look into the linter issue. My naive assumption was that noqa
should work.
Couple questions about possibly unintended formatting errors
messages = [ | ||
Message( | ||
role="user", | ||
content="Below is an instruction that describes a task. Write a response " | ||
"that appropriately completes the request.\n\n### Instruction:\nGenerate " | ||
"a realistic dating profile bio.\n\n### Response:\n", | ||
"that appropriately completes the request.\n\n### Instruction:\nGenerate " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, something change from my local linter changes probably, will fix
@@ -311,21 +147,17 @@ def test_tokenizer_vocab_size(self, tokenizer): | |||
assert tokenizer.vocab_size == 128257 | |||
|
|||
def test_tokenize_text_messages( | |||
self, tokenizer, user_text_message, assistant_text_message | |||
self, tokenizer, user_text_message, assistant_text_message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same is here
I assume that fixed |
Grr still failing. Mind if I take a look? |
Isn't it about lines with # noqa? |
Ah, I see: |
Fixed |
One more... |
fixed with flake |
@joecummings Sorry for such lint failures, but I could not able to run pre-commit run --all-files due to current fixes |
@joecummings Probably I found solution we need to use both # noqa and # fmt: skip. But I really don't like it |
Oh, I broke something... |
I don't know what is it, this tests are passing on my local and branch is up to date |
Isn't it related too #1886? Some weird fail with torchao |
Can we restart CI here? Or I'm not sure how to fix some torchao unrelated stuff |
@felipemello1 @RdoubleA Maybe you can comment how to fix this torchao thing? Really strange and probably just CI rerun can't help |
Fixed |
Can someone restart CI? |
resolved |
bdf3be3
to
f8b93f4
Compare
@RdoubleA Can we fix Qwen2 and Qwen2.5 tests in separate PR? I will open it immediately after we merge this without other models |
@krammnic Sorry for the delay, that sounds good to me. Looks like we just need to resolve merge conflicts and we can land this. |
@krammnic Went ahead and did the merge with main, thanks again for your help! |
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses.
Changelog
What are the changes made in this PR?
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example
Will require changes in CI(pre-commit run makes expected_tokens lists unreadable)