Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused FSDP components #2016

Merged
merged 7 commits into from
Nov 16, 2024

Conversation

ebsmothers
Copy link
Contributor

I was attempting to patch the changes from #1933 so we can merge them, but I'm unable to push to the remote branch (likely because the PR was opened off of the fork's main branch which has not yet been merged with latest changes from upstream). So I've just patched everything into this new PR instead.

Tagging @krammnic as the author of the original PR. The only changes I've made were merging latest changes from main and updating test_validate_missing_and_unexpected_for_lora to get it to pass (and adding some of the test cases from validate_state_dict_for_lora that were deleted)

Copy link

pytorch-bot bot commented Nov 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2016

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit f772fbd with merge base ac14e96 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 16, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.47%. Comparing base (f15ba77) to head (ba9c60e).
Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2016       +/-   ##
===========================================
+ Coverage   24.74%   64.47%   +39.72%     
===========================================
  Files         317      317               
  Lines       17669    17535      -134     
===========================================
+ Hits         4373    11305     +6932     
+ Misses      13296     6230     -7066     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SalmanMohammadi
Copy link
Collaborator

SalmanMohammadi commented Nov 16, 2024

One small thing - is it worth deprecating the public APIs and deleting next release? I'm not sure if people are really using these so your call.

@@ -77,7 +77,6 @@ PEFT Components
peft.set_trainable_params
peft.get_adapter_state_dict
peft.validate_missing_and_unexpected_for_lora
peft.validate_state_dict_for_lora
Copy link
Collaborator

@SalmanMohammadi SalmanMohammadi Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 209 in the LoRA finetune tutorial:

.. note::
    Whenever loading weights with :code:`strict=False`, you should verify that any missing or extra keys in
    the loaded :code:`state_dict` are as expected. torchtune's LoRA recipes do this by default via e.g.
    :func:`validate_state_dict_for_lora() <torchtune.modules.peft.validate_state_dict_for_lora>` or
    :func:`validate_missing_and_unexpected_for_lora() <torchtune.modules.peft.validate_missing_and_unexpected_for_lora>`.

Needs to be updated

init_distributed
is_distributed
get_world_size_and_rank
get_full_finetune_fsdp_wrap_policy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reference to this in the QAT tutorial too.

get_full_optimizer_state_dict,
get_shard_conditions,
get_world_size_and_rank,
init_distributed,
is_distributed,
load_from_full_model_state_dict,
load_from_full_optimizer_state_dict,
lora_fsdp_wrap_policy,
prepare_model_for_fsdp_with_meta_device,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reference to this in a comment here

@SalmanMohammadi
Copy link
Collaborator

Some big ol nits due to some floating references - thanks for picking this up : )

@ebsmothers
Copy link
Contributor Author

One small thing - is it worth deprecating the public APIs and deleting next release? I'm not sure if people are really using these so your call

Yeah this is a fair question. In this case I am gonna yolo a bit because we haven't supported FSDP1 for multiple releases at this point. And I think most of these APIs should not really have been public to begin with, but we made them so by necessity because many were used directly in our recipes

@ebsmothers ebsmothers merged commit 0c31907 into pytorch:main Nov 16, 2024
17 checks passed
@ebsmothers ebsmothers mentioned this pull request Nov 26, 2024
44 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants