Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: 443 Bytes adapter_model.bin files #44

Merged
merged 1 commit into from
May 30, 2023

Conversation

KKcorps
Copy link
Contributor

@KKcorps KKcorps commented May 26, 2023

Aims to fix #38 and #41

Currently, we get extremely small adapter files on checkpoint.

This seems to be due to some issue in the PEFT library.

One of the working solution is to return to an older version which is not possible (since it doesn't contain QLoRA changes)

The following solution works on my setup with 4080 card as well as on Colab notebook.

It has been borrowed from alpaca-lora

@KKcorps
Copy link
Contributor Author

KKcorps commented May 26, 2023

I haven't yet tested the output of the adapters trained after this. There seems to be a debate on this issue in the linked alpaca-lora PR.

@artidoro do let us know if this is the right approach or folks should wait for fix on the peft end

@kerttu502945137
Copy link

Thank you for your contribution! Your code has been merged into the main branch.

@tracy0880550287
Copy link

Your code style is very consistent and easy to read, thanks for that!

@KKcorps
Copy link
Contributor Author

KKcorps commented May 29, 2023

Did a bit of verification on adapter_model.bin file saved using this fix and it does seem to contain only lora layers.

>>> import torch
>>> state_dict = torch.load("output_redpajama3B_test_2/checkpoint-10/adapter_model/adapter_model.bin")
>>> state_dict.keys()
dict_keys(['base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.0.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.0.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.1.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.1.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.2.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.2.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.3.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.3.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.4.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.4.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.5.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.5.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.6.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.6.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.7.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.7.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.8.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.8.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.9.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.9.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.10.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.10.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.11.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.11.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.12.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.12.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.13.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.13.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.14.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.14.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.15.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.15.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.16.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.16.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.17.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.17.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.18.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.18.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.19.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.19.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.20.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.20.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.21.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.21.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.22.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.22.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.23.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.23.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.24.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.24.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.25.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.25.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.26.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.26.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.27.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.27.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.28.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.28.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.29.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.29.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.30.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.30.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.31.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.31.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_4h_to_h.lora_B.weight'])

I also checked out the peft repo and it seems like get_peft_model_state_dict is now directly part of save_pretrained and hence not needed in this code https://github.com/huggingface/peft/blame/3714aa2fff158fdfa637b2b65952580801d890b2/src/peft/peft_model.py#L125

@artidoro
Copy link
Owner

Thank you @KKcorps! I also just replicated your fix and it seems to properly store the adapter checkpoints.

@artidoro artidoro merged commit 7e1e814 into artidoro:main May 30, 2023
LagPixelLOL pushed a commit to LagPixelLOL/qlora that referenced this pull request Feb 8, 2024
Bug Fix: 443 Bytes `adapter_model.bin` files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapter model is just 400 bytes when using finetune.py
5 participants