Bug: lora_post_process_for_vllm has no effect #414

achillefokoue · 2024-12-11T19:08:05Z

Describe the bug

The option "lora_post_process_for_vllm" does not seem to have any effect. It is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration as "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors."

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors. Later, when the fine-tuned model is served by vllm, the following error occurs:

raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"

Platform

Interpreter version: Python 3.12.7
Library version: The version of the code from the main branch as of December 9 at 4:27 pm ET

Sample Code

export SFT_TRAINER_CONFIG_JSON_PATH= config.json

accelerate launch --num_processes=5 --config_file fixtures/accelerate_fsdp_defaults.yaml tuning/sft_trainer.py

where the content of config.json is as follows:

{
    "config_file": "fixtures/accelerate_fsdp_defaults.yaml",
    "model_name_or_path": "mistralai/Mixtral-8x7B-v0.1",
    "training_data_path": $TRAINING_PATH,
    "output_dir": $OUTPUT_PATH,
    "num_train_epochs": 10.0,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "torch_dtype": "float16",
    "peft_method": "lora",
    "r": 8,
    "lora_dropout": 0.05,
    "target_modules": "all-linear",
    "lora_post_process_for_vllm": true
}

Expected behavior

The expected behavior is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration: "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors."

Observed behavior

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors. Later, when the fine-tuned model is served by vllm, the following error occurs:

raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: lora_post_process_for_vllm has no effect #414

Bug: lora_post_process_for_vllm has no effect #414

achillefokoue commented Dec 11, 2024 •

edited

Loading

Bug: lora_post_process_for_vllm has no effect #414

Bug: lora_post_process_for_vllm has no effect #414

Comments

achillefokoue commented Dec 11, 2024 • edited Loading

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

achillefokoue commented Dec 11, 2024 •

edited

Loading