-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use finetuned lora adapter in a huggingface-like pipeline #1779
Comments
Hi @ryf1123 thanks for creating the issue. I think (2) is the way we intend for this to be done, unfortunately looks like it doesn't currently work! So glad you pointed this out. I think the problem is that for the Llama 3.2 Vision model the config is structured a bit differently.. so the The ideal case you described of loading in with |
hey @ryf1123, thanks for raising this issue! We put up this PR, updating the configs to use HF ckpt. It still saves the adapter as torchtune, and not PeFT format. We need to work on it, since its not text-only. We have to see how the vision part works. Regarding your error "KeyError: 'num_attention_heads'", it happens here:
The correct way should be: num_heads=self._config["text_config"]["num_attention_heads"] Check what we do here:
If you wanna try to give it a stab, and see if you can make it PeFT compatible, we would love the PR :) |
update: @pbontrager is working on this :) |
Hi, thanks for this amazing project. I was trying to finetune the lora model for Llama3.2 Vision which works fine and saved a adapter_0.pt; Then I wanted to use this adapter checkpoint for inference in a huggingface pipeline where i see some issues. I tried two ways as following and thanks in advance!
Weight format conversion: there is a file to convert meta (meta_model_0.pt) to tune, then to HF as in this _convert_weights.py. I tried to use it to convert a full model (non-lora), and it works for inference, but it does not work the adapter checkpoint as the parameters names are different.
Then I tried to set the
_component_
parameter totorchtune.training.FullModelHFCheckpointer
hoping to get a huggingface compatible model directly. Then I got the error: *** KeyError: 'num_attention_heads'.What would be ideal is to have a way to use the finetuned model in a way similar to the following? Note the following code does not work as there is no
adapter_config.json
under thecheckpointer/output_dir
path.To reproduce what I observed:
I am using the following command to run, with slightly modified config file:
tune run --nnodes 1 --nproc_per_node 1 lora_finetune_distributed --config ./finetune-llama32/11B _lora_debug.yaml
Main modification:
from
to
Full config file:
The text was updated successfully, but these errors were encountered: