-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove lm_head post processing #333
fix: remove lm_head post processing #333
Conversation
Signed-off-by: Abhishek <[email protected]>
build/Dockerfile
Outdated
@@ -105,7 +105,7 @@ FROM cuda-devel AS python-installations | |||
ARG WHEEL_VERSION | |||
ARG USER | |||
ARG USER_UID | |||
ARG ENABLE_FMS_ACCELERATION=false | |||
ARG ENABLE_FMS_ACCELERATION=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with this change, fms-acceleration can be installed by default thus enabling QLoRA support by default. The lm_head removal was blocking QLoRA enablement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me, waiting on test results from Abhishek
After testing, found that accelerate version is not working as expected. New logic intorduced in get_state_dict, also removes the top-level FSDP wrapper from the model. So then since FSDP keeps flattened params, all the parameters managed by the top-level wrapper will now remained flattened when model.state_dict is called. The other child FSDP wrappers will protect their parameters, since when the state_dict call recurses to them, they will use the FSDP version of state_dict to unwrap the wrappers. This results in error:
|
Signed-off-by: Anh Uong <[email protected]>
Signed-off-by: Anh Uong <[email protected]>
* fix: Removal of lm head hack Signed-off-by: Abhishek <[email protected]> * set fms_accelerate to true by default Signed-off-by: Anh Uong <[email protected]> --------- Signed-off-by: Abhishek <[email protected]> Signed-off-by: Anh Uong <[email protected]> Co-authored-by: Anh Uong <[email protected]> Signed-off-by: Angel Luu <[email protected]>
Description of the change
Removal of lm_head hack which was made to fix lm_head issue and now fixed with newer vllm versions, the change coming in as of v0.5.4
Related issue number
#1166
How to verify the PR
Running LoRA and full fine tuning of granite-3b and llama-8b model without removal of lm_head able to run inference on.
Was the PR tested