Skip to content

Commit

Permalink
Add more details on qLORA
Browse files Browse the repository at this point in the history
  • Loading branch information
aluu317 committed Sep 11, 2024
1 parent d09fc8c commit fa0668e
Showing 1 changed file with 17 additions and 26 deletions.
43 changes: 17 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- [Tips on Parameters to Set](#tips-on-parameters-to-set)
- [Tuning Techniques](#tuning-techniques)
- [LoRA Tuning Example](#lora-tuning-example)
- [qLoRA Tuning Example](#qlora-tuning-example)
- [GPTQ-LoRA with AutoGPTQ Tuning Example](#gptq-lora-with-autogptq-tuning-example)
- [Prompt Tuning](#prompt-tuning)
- [Fine Tuning](#fine-tuning)
- [FMS Acceleration](#fms-acceleration)
Expand Down Expand Up @@ -434,30 +434,16 @@ Example 3:
_________________________


### qLoRA Tuning Example
### GPTQ-LoRA with AutoGPTQ Tuning Example

This method is similar to LoRA Tuning, but the base model is a quantized model.
Set `peft_method` to `"lora"`. You can pass any of LoraConfig, see section on [LoRA Example](#lora-tuning-example).
In addition, you can pass [LoRA quantization config](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/acceleration_configs/quantized_lora_config.py#L62).
```py
# to use auto_gptq 4bit lora base layers
auto_gptq: AutoGPTQLoraConfig = None

# to use auto_gptq 4bit lora base layers
bnb_qlora: BNBQLoraConfig = None
```

```py
class AutoGPTQLoraConfig:
This method is similar to LoRA Tuning, but the base model is a quantized model. We currently only support GPTQ-LoRA model that has been quantized with 4-bit AutoGPTQ technique. Bits-and-Bytes (BNB) quantized LoRA is not yet enabled.
The qLoRA tuning technique is enabled via the [fms-acceleration](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/README.md#fms-acceleration) package.
You can see details on a sample configuration of Accelerated GPTQ-LoRA [here](https://github.com/foundation-model-stack/fms-acceleration/blob/main/sample-configurations/accelerated-peft-autogptq-sample-configuration.yaml).

# auto_gptq supports various kernels, to select the kernel to use.
kernel: str = "triton_v2"

# allow auto_gptq to quantize a model before training commences.
# NOTE: currently this is not allowed.
from_quantized: bool = True
To use GPTQ-LoRA technique, you can set the `quantized_lora_config` defined [here](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/acceleration_configs/quantized_lora_config.py). See the Notes section of FMS Acceleration doc [below](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/README.md#fms-acceleration) for usage. The only kernel we are supporting currently is `triton_v2`.

```
In addition, LoRA tuning technique is required to be used, set `peft_method` to `"lora"` and pass any arguments from [LoraConfig](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/peft_config.py#L21).

Example command to run:

Expand All @@ -469,18 +455,21 @@ python tuning/sft_trainer.py \
--output_dir $OUTPUT_PATH \
--num_train_epochs 40 \
--per_device_train_batch_size 4 \
---learning_rate 1e-4 \
--learning_rate 1e-4 \
--response_template "\n### Label:" \
--dataset_text_field "output" \
--peft_method "lora" \
--r 8 \
--lora_dropout 0.05 \
--lora_alpha 16 \
--target_modules c_attn c_proj
--auto_gptq triton_v2
--target_modules c_attn c_proj \
--auto_gptq triton_v2 \ # setting quantized_lora_config
--torch_dtype float16 \ # need this for triton_v2
--fp16 \ # need this for triton_v2
```

Equally you can pass in a JSON configuration for running tuning. See [build doc](./build/README.md) for more details. The above can also be passed in as JSON:

```json
{
"model_name_or_path": $MODEL_PATH,
Expand All @@ -495,8 +484,10 @@ Equally you can pass in a JSON configuration for running tuning. See [build doc]
"r": 8,
"lora_dropout": 0.05,
"lora_alpha": 16,
"target_modules": ["c_attn", "c_proj"]
"auto_gptq": ["triton_v2"]
"target_modules": ["c_attn", "c_proj"],
"auto_gptq": ["triton_v2"], // setting quantized_lora_config
"torch_dtype": "float16", // need this for triton_v2
"fp16": true // need this for triton_v2
}
```

Expand Down

0 comments on commit fa0668e

Please sign in to comment.