How to use finetuned lora adapter in a huggingface-like pipeline #1779

ryf1123 · 2024-10-09T11:14:49Z

Hi, thanks for this amazing project. I was trying to finetune the lora model for Llama3.2 Vision which works fine and saved a adapter_0.pt; Then I wanted to use this adapter checkpoint for inference in a huggingface pipeline where i see some issues. I tried two ways as following and thanks in advance!

Weight format conversion: there is a file to convert meta (meta_model_0.pt) to tune, then to HF as in this _convert_weights.py. I tried to use it to convert a full model (non-lora), and it works for inference, but it does not work the adapter checkpoint as the parameters names are different.
Then I tried to set the _component_ parameter to torchtune.training.FullModelHFCheckpointer hoping to get a huggingface compatible model directly. Then I got the error: *** KeyError: 'num_attention_heads'.

  File ".../lora_finetune_distributed.py", line 725, in save_checkpoint
    self._checkpointer.save_checkpoint(
  File ".../lib/python3.10/site-packages/torchtune/training/checkpointing/_checkpointer.py", line 639, in save_checkpoint
    num_heads=self._config["num_attention_heads"],
KeyError: 'num_attention_heads'

What would be ideal is to have a way to use the finetuned model in a way similar to the following? Note the following code does not work as there is no adapter_config.json under the checkpointer/output_dir path.

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

# Replace the original model loading code with this:
# with torch.inference_mode():
model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    low_cpu_mem_usage=True,
)

from peft import PeftModel
peft_model = PeftModel.from_pretrained(model, "checkpointer/output_dir")

To reproduce what I observed:

I am using the following command to run, with slightly modified config file:

tune run --nnodes 1 --nproc_per_node 1 lora_finetune_distributed --config ./finetune-llama32/11B _lora_debug.yaml

Main modification:

from

_component_: torchtune.training.FullModelMetaCheckpointer
originally: checkpoint_files: [consolidated.pth]

to

  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /data/home/xxx/models/Llama-3.2-11B-Vision-Instruct/
  checkpoint_files: [model-00001-of-00005.safetensors,
                      model-00002-of-00005.safetensors,
                      model-00003-of-00005.safetensors,
                      model-00004-of-00005.safetensors,
                      model-00005-of-00005.safetensors ]

Full config file:

# Config for multi-device LoRA finetuning in lora_finetune_distributed.py
# using a Llama3.2 11B Vision Instruct model
#
# This config assumes that you've run the following command before launching:
#   tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct
#
# To launch on 2 devices, run the following command from root:
#   tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_2_vision/11B_lora
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training:
#   tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_2_vision/11B_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works best when the model is being fine-tuned on 2+ GPUs.
# For single device LoRA finetuning please use 11B_lora_single_device.yaml
# or 11B_qlora_single_device.yaml

# Model arguments
model:
  _component_: torchtune.models.llama3_2_vision.lora_llama3_2_vision_11b
  decoder_trainable: "frozen"
  encoder_trainable: "lora"
  fusion_trainable: "lora"
  lora_attn_modules: ['q_proj', 'v_proj']
  apply_lora_to_mlp: False
  apply_lora_to_output: False
  lora_rank: 8
  lora_alpha: 16
  lora_dropout: 0.0
  image_size: 560 # Make sure this matches the image_size in tokenizer

# Transform
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  path: /data/home/xxx/models/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model
  image_size: 560

# Checkpointer
checkpointer:
  # _component_: torchtune.training.FullModelMetaCheckpointer
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /data/home/xxx/models/Llama-3.2-11B-Vision-Instruct/
  checkpoint_files: [model-00001-of-00005.safetensors,
                      model-00002-of-00005.safetensors,
                      model-00003-of-00005.safetensors,
                      model-00004-of-00005.safetensors,
                      model-00005-of-00005.safetensors ]
  # originally: checkpoint_files: [consolidated.pth]
  recipe_checkpoint: null
  output_dir: /tmp/Llama-3.2-11B-Vision-Instruct/
  model_type: LLAMA3_VISION
resume_from_checkpoint: False

# Dataset
dataset:
  _component_: torchtune.datasets.multimodal.the_cauldron_dataset
  subset: ocrvqa
seed: null
shuffle: True
collate_fn: torchtune.data.padded_collate_tiled_images_and_mask

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 4
optimizer:
  _component_: torch.optim.AdamW
  fused: True
  weight_decay: 0.01
  lr: 2e-5
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
clip_grad_norm: 1.0
compile: False # set it to True for better memory and performance

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True
enable_activation_offloading: False
dtype: bf16

# Logging
output_dir: /tmp/full-llama3.2-vision-finetune
log_peak_memory_stats: False
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  project: llama3.2_lora_project
log_every_n_steps: 1

The text was updated successfully, but these errors were encountered:

ebsmothers · 2024-10-10T02:59:45Z

Hi @ryf1123 thanks for creating the issue. I think (2) is the way we intend for this to be done, unfortunately looks like it doesn't currently work! So glad you pointed this out. I think the problem is that for the Llama 3.2 Vision model the config is structured a bit differently.. so the num_attention_heads field is still there, but it's just under the text_config field.

The ideal case you described of loading in with PeftModel.from_pretrained is exactly what we want to have and it should currently work for our text models (see the test plan in #933). But I think we need to make some changes to save the adapter weights properly for the multimodal model. I am gonna take a closer look at this, and will also assign to @pbontrager for further investigation since he's quite familiar with the multimodal key mappings

felipemello1 · 2024-10-26T03:37:16Z

hey @ryf1123, thanks for raising this issue! We put up this PR, updating the configs to use HF ckpt. It still saves the adapter as torchtune, and not PeFT format. We need to work on it, since its not text-only. We have to see how the vision part works.

Regarding your error "KeyError: 'num_attention_heads'", it happens here:

torchtune/torchtune/training/checkpointing/_checkpointer.py

Line 639 in 33b8143

num_heads=self._config["num_attention_heads"],

The correct way should be: num_heads=self._config["text_config"]["num_attention_heads"]

Check what we do here:

torchtune/torchtune/training/checkpointing/_checkpointer.py

Line 478 in 33b8143

num_heads=text_config["num_attention_heads"],

If you wanna try to give it a stab, and see if you can make it PeFT compatible, we would love the PR :)

felipemello1 · 2024-10-28T14:52:40Z

update: @pbontrager is working on this :)

ebsmothers assigned pbontrager Oct 10, 2024

joecummings mentioned this issue Oct 15, 2024

v0.4.0 release tracker #1747

Closed

34 tasks

pbontrager mentioned this issue Nov 13, 2024

Llama Vision PEFT #1937

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use finetuned lora adapter in a huggingface-like pipeline #1779

How to use finetuned lora adapter in a huggingface-like pipeline #1779

ryf1123 commented Oct 9, 2024

ebsmothers commented Oct 10, 2024

felipemello1 commented Oct 26, 2024 •

edited

Loading

felipemello1 commented Oct 28, 2024

How to use finetuned lora adapter in a huggingface-like pipeline #1779

How to use finetuned lora adapter in a huggingface-like pipeline #1779

Comments

ryf1123 commented Oct 9, 2024

ebsmothers commented Oct 10, 2024

felipemello1 commented Oct 26, 2024 • edited Loading

felipemello1 commented Oct 28, 2024

felipemello1 commented Oct 26, 2024 •

edited

Loading