Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove lm_head post processing #333

Merged
merged 3 commits into from
Sep 13, 2024

Conversation

Abhishek-TAMU
Copy link
Collaborator

@Abhishek-TAMU Abhishek-TAMU commented Sep 5, 2024

Description of the change

Removal of lm_head hack which was made to fix lm_head issue and now fixed with newer vllm versions, the change coming in as of v0.5.4

Related issue number

#1166

How to verify the PR

Running LoRA and full fine tuning of granite-3b and llama-8b model without removal of lm_head able to run inference on.

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

@Abhishek-TAMU Abhishek-TAMU marked this pull request as ready for review September 5, 2024 20:07
@anhuong anhuong changed the title fix: Removal of lm head hack fix: remove lm_head post processing with new accelerate version Sep 5, 2024
build/Dockerfile Outdated
@@ -105,7 +105,7 @@ FROM cuda-devel AS python-installations
ARG WHEEL_VERSION
ARG USER
ARG USER_UID
ARG ENABLE_FMS_ACCELERATION=false
ARG ENABLE_FMS_ACCELERATION=true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with this change, fms-acceleration can be installed by default thus enabling QLoRA support by default. The lm_head removal was blocking QLoRA enablement.

Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me, waiting on test results from Abhishek

@anhuong
Copy link
Collaborator

anhuong commented Sep 10, 2024

After testing, found that accelerate version is not working as expected.

New logic intorduced in get_state_dict, also removes the top-level FSDP wrapper from the model. So then since FSDP keeps flattened params, all the parameters managed by the top-level wrapper will now remained flattened when model.state_dict is called. The other child FSDP wrappers will protect their parameters, since when the state_dict call recurses to them, they will use the FSDP version of state_dict to unwrap the wrappers.

This results in error:

size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([62915840]) from checkpoint, the shape in current model is torch.Size([49152, 2560]).
size mismatch for model.norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([2560]). 

@anhuong anhuong closed this Sep 10, 2024
@anhuong anhuong reopened this Sep 11, 2024
@anhuong anhuong changed the title fix: remove lm_head post processing with new accelerate version fix: remove lm_head post processing Sep 11, 2024
@anhuong anhuong merged commit c40ae7f into foundation-model-stack:main Sep 13, 2024
7 checks passed
aluu317 pushed a commit to aluu317/fms-hf-tuning that referenced this pull request Sep 13, 2024
* fix: Removal of lm head hack

Signed-off-by: Abhishek <[email protected]>

* set fms_accelerate to true by default

Signed-off-by: Anh Uong <[email protected]>

---------

Signed-off-by: Abhishek <[email protected]>
Signed-off-by: Anh Uong <[email protected]>
Co-authored-by: Anh Uong <[email protected]>
Signed-off-by: Angel Luu <[email protected]>
@Abhishek-TAMU Abhishek-TAMU deleted the remove_lm_head branch September 19, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants