Add more details on qLORA

foundation-model-stack · Sep 11, 2024 · fa0668e · fa0668e
1 parent d09fc8c
commit fa0668e
Showing 1 changed file with 17 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@
   - [Tips on Parameters to Set](#tips-on-parameters-to-set)
 - [Tuning Techniques](#tuning-techniques)
   - [LoRA Tuning Example](#lora-tuning-example)
-  - [qLoRA Tuning Example](#qlora-tuning-example)
+  - [GPTQ-LoRA with AutoGPTQ Tuning Example](#gptq-lora-with-autogptq-tuning-example)
   - [Prompt Tuning](#prompt-tuning)
   - [Fine Tuning](#fine-tuning)
   - [FMS Acceleration](#fms-acceleration)
@@ -434,30 +434,16 @@ Example 3:
 _________________________
 
 
-### qLoRA Tuning Example
+### GPTQ-LoRA with AutoGPTQ Tuning Example
 
-This method is similar to LoRA Tuning, but the base model is a quantized model.
-Set `peft_method` to `"lora"`. You can pass any of LoraConfig, see section on [LoRA Example](#lora-tuning-example).
-In addition, you can pass [LoRA quantization config](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/acceleration_configs/quantized_lora_config.py#L62).
-```py
-# to use auto_gptq 4bit lora base layers
-auto_gptq: AutoGPTQLoraConfig = None
-
-# to use auto_gptq 4bit lora base layers
-bnb_qlora: BNBQLoraConfig = None
-```
-
-```py
-class AutoGPTQLoraConfig:
+This method is similar to LoRA Tuning, but the base model is a quantized model. We currently only support GPTQ-LoRA model that has been quantized with 4-bit AutoGPTQ technique. Bits-and-Bytes (BNB) quantized LoRA is not yet enabled.
+The qLoRA tuning technique is enabled via the [fms-acceleration](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/README.md#fms-acceleration) package.
+You can see details on a sample configuration of Accelerated GPTQ-LoRA [here](https://github.com/foundation-model-stack/fms-acceleration/blob/main/sample-configurations/accelerated-peft-autogptq-sample-configuration.yaml).
 
-  # auto_gptq supports various kernels, to select the kernel to use.
-  kernel: str = "triton_v2"
 
-  # allow auto_gptq to quantize a model before training commences.
-  # NOTE: currently this is not allowed.
-  from_quantized: bool = True
+To use GPTQ-LoRA technique, you can set the `quantized_lora_config` defined [here](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/acceleration_configs/quantized_lora_config.py). See the Notes section of FMS Acceleration doc [below](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/README.md#fms-acceleration) for usage. The only kernel we are supporting currently is `triton_v2`.
 
-```
+In addition, LoRA tuning technique is required to be used, set `peft_method` to `"lora"` and pass any arguments from [LoraConfig](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/peft_config.py#L21).
 
 Example command to run:
 
@@ -469,18 +455,21 @@ python tuning/sft_trainer.py \
 --output_dir $OUTPUT_PATH \
 --num_train_epochs 40 \
 --per_device_train_batch_size 4 \
----learning_rate 1e-4 \
+--learning_rate 1e-4 \
 --response_template "\n### Label:" \
 --dataset_text_field "output" \
 --peft_method "lora" \
 --r 8 \
 --lora_dropout 0.05 \
 --lora_alpha 16 \
---target_modules c_attn c_proj
---auto_gptq triton_v2
+--target_modules c_attn c_proj \
+--auto_gptq triton_v2 \ # setting quantized_lora_config 
+--torch_dtype float16 \ # need this for triton_v2
+--fp16 \ # need this for triton_v2
 ```
 
 Equally you can pass in a JSON configuration for running tuning. See [build doc](./build/README.md) for more details. The above can also be passed in as JSON:
+
 ```json
 {
     "model_name_or_path": $MODEL_PATH,
@@ -495,8 +484,10 @@ Equally you can pass in a JSON configuration for running tuning. See [build doc]
     "r": 8,
     "lora_dropout": 0.05,
     "lora_alpha": 16,
-    "target_modules": ["c_attn", "c_proj"]
-    "auto_gptq": ["triton_v2"]
+    "target_modules": ["c_attn", "c_proj"],
+    "auto_gptq": ["triton_v2"], // setting quantized_lora_config
+    "torch_dtype": "float16", // need this for triton_v2
+    "fp16": true // need this for triton_v2
 }
 ```