Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: lora_post_process_for_vllm has no effect #414

Open
achillefokoue opened this issue Dec 11, 2024 · 0 comments
Open

Bug: lora_post_process_for_vllm has no effect #414

achillefokoue opened this issue Dec 11, 2024 · 0 comments

Comments

@achillefokoue
Copy link

achillefokoue commented Dec 11, 2024

Describe the bug

The option "lora_post_process_for_vllm" does not seem to have any effect. It is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration as "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors."

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors. Later, when the fine-tuned model is served by vllm, the following error occurs:

raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"

Platform

  • Interpreter version: Python 3.12.7
  • Library version: The version of the code from the main branch as of December 9 at 4:27 pm ET

Sample Code

export SFT_TRAINER_CONFIG_JSON_PATH= config.json

accelerate launch --num_processes=5 --config_file fixtures/accelerate_fsdp_defaults.yaml tuning/sft_trainer.py

where the content of config.json is as follows:

{
    "config_file": "fixtures/accelerate_fsdp_defaults.yaml",
    "model_name_or_path": "mistralai/Mixtral-8x7B-v0.1",
    "training_data_path": $TRAINING_PATH,
    "output_dir": $OUTPUT_PATH,
    "num_train_epochs": 10.0,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "torch_dtype": "float16",
    "peft_method": "lora",
    "r": 8,
    "lora_dropout": 0.05,
    "target_modules": "all-linear",
    "lora_post_process_for_vllm": true
}

Expected behavior

The expected behavior is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration: "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors."

Observed behavior

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors. Later, when the fine-tuned model is served by vllm, the following error occurs:

raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"

Additional context

Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant