Bug Fix: 443 Bytes `adapter_model.bin` files #44

KKcorps · 2023-05-26T16:37:10Z

Aims to fix #38 and #41

Currently, we get extremely small adapter files on checkpoint.

This seems to be due to some issue in the PEFT library.

One of the working solution is to return to an older version which is not possible (since it doesn't contain QLoRA changes)

The following solution works on my setup with 4080 card as well as on Colab notebook.

It has been borrowed from alpaca-lora

KKcorps · 2023-05-26T17:45:38Z

I haven't yet tested the output of the adapters trained after this. There seems to be a debate on this issue in the linked alpaca-lora PR.

@artidoro do let us know if this is the right approach or folks should wait for fix on the peft end

kerttu502945137 · 2023-05-26T22:48:20Z

Thank you for your contribution! Your code has been merged into the main branch.

tracy0880550287 · 2023-05-28T10:55:20Z

Your code style is very consistent and easy to read, thanks for that!

KKcorps · 2023-05-29T08:42:59Z

Did a bit of verification on adapter_model.bin file saved using this fix and it does seem to contain only lora layers.

>>> import torch
>>> state_dict = torch.load("output_redpajama3B_test_2/checkpoint-10/adapter_model/adapter_model.bin")
>>> state_dict.keys()
dict_keys(['base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.0.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.0.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.1.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.1.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.2.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.2.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.3.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.3.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.4.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.4.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.5.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.5.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.6.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.6.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.7.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.7.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.8.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.8.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.9.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.9.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.10.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.10.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.11.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.11.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.12.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.12.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.13.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.13.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.14.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.14.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.15.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.15.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.16.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.16.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.17.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.17.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.18.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.18.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.19.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.19.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.20.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.20.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.21.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.21.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.22.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.22.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.23.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.23.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.24.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.24.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.25.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.25.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.26.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.26.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.27.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.27.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.28.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.28.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.29.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.29.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.30.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.30.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.31.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.31.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_4h_to_h.lora_B.weight'])

I also checked out the peft repo and it seems like get_peft_model_state_dict is now directly part of save_pretrained and hence not needed in this code https://github.com/huggingface/peft/blame/3714aa2fff158fdfa637b2b65952580801d890b2/src/peft/peft_model.py#L125

artidoro · 2023-05-30T18:55:39Z

Thank you @KKcorps! I also just replicated your fix and it seems to properly store the adapter checkpoints.

Bug Fix: 443 Bytes `adapter_model.bin` files

Remove resetting old state to fix empty adapters

f1d1b29

taishan1994 mentioned this pull request May 27, 2023

RuntimeError: self and mat2 must have the same dtype #43

Closed

kbakdev approved these changes May 27, 2023

View reviewed changes

ortegaalfredo mentioned this pull request May 27, 2023

Can't resume from checkpoint #50

Open

KKcorps mentioned this pull request May 28, 2023

Model finished training, but adapter_model.bin is empty? #69

Open

amdnsr mentioned this pull request May 28, 2023

AttributeError: 'tuple' object has no attribute 'load_in_8bit' while trying inference #72

Open

Glavin001 mentioned this pull request May 29, 2023

[bug] Completed model does not load from checkpoint / generate produces same as base model #76

Open

KKcorps mentioned this pull request May 29, 2023

Bug Fix: Resume training from checkpoints #79

Merged

artidoro approved these changes May 30, 2023

View reviewed changes

artidoro merged commit 7e1e814 into artidoro:main May 30, 2023

LagPixelLOL pushed a commit to LagPixelLOL/qlora that referenced this pull request Feb 8, 2024

Merge pull request artidoro#44 from KKcorps/empty_adapter_patch

478ad65

Bug Fix: 443 Bytes `adapter_model.bin` files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix: 443 Bytes `adapter_model.bin` files #44

Bug Fix: 443 Bytes `adapter_model.bin` files #44

KKcorps commented May 26, 2023

KKcorps commented May 26, 2023

kerttu502945137 commented May 26, 2023

tracy0880550287 commented May 28, 2023

KKcorps commented May 29, 2023

artidoro commented May 30, 2023

Bug Fix: 443 Bytes adapter_model.bin files #44

Bug Fix: 443 Bytes adapter_model.bin files #44

Conversation

KKcorps commented May 26, 2023

KKcorps commented May 26, 2023

kerttu502945137 commented May 26, 2023

tracy0880550287 commented May 28, 2023

KKcorps commented May 29, 2023

artidoro commented May 30, 2023

Bug Fix: 443 Bytes `adapter_model.bin` files #44

Bug Fix: 443 Bytes `adapter_model.bin` files #44