diff --git a/architecture_records/002-acceleration-framework.md b/architecture_records/002-acceleration-framework.md index 84790e50d..b84f235ed 100644 --- a/architecture_records/002-acceleration-framework.md +++ b/architecture_records/002-acceleration-framework.md @@ -1,4 +1,4 @@ -# Training and FineTuning Acceleration Framework +# Training Enhancements Framework **Deciders(s)**: Sukriti Sharma (sukriti.sharma4@ibm.com), Raghu Ganti (rganti@us.ibm.com), Laura Wynter (lwynter@sg.ibm.com), Fabian Lim (flim@sg.ibm.com), Aaron Chew (aaron.chew1@ibm.com) **Date (YYYY-MM-DD)**: 2024-04-11 @@ -31,14 +31,18 @@ Currently `sft_trainer.py` only can access those tools already integrated in HF. 2. Prefix tuning from [PEFT](https://github.com/huggingface/peft). 3. FSDP training from [accelerate](https://github.com/huggingface/accelerate). -Below are various reasons for a framework to integrate custom training tools into [`sft_trainer.py`]. +Below are various reasons for a framework to integrate custom training tools into `sft_trainer.py`. * Enable quick integrations of open-source techniques that have yet to be integrated into Huggingface. * Enable integrations of custom techniques developed by IBM researchers, that are not planned be integrated into Huggingface. -Recently, it has been observed that new training techniques are released with an incomplete "preview" version. These "preview" versions tend to be not be fully integrated into OSS. Therefore, using new techniques typically involve additional work. This framework aims to allow timely integrations of such techniques into `sft_trainer.py`. A short exampler list of powerful training techniques but are "preview"-only include: +Recently, it has been observed that new training techniques are released with an incomplete "preview" version. These "preview" versions tend to be not be fully integrated into OSS. Therefore, using new techniques typically involve additional work. This framework aims to allow timely integrations of such techniques into `sft_trainer.py`. A short list of powerful training techniques but are "preview"-only include: +- Huggingface integration of [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ). + * 4-bit quantization kernels to reduce memory storage of the base weights. - [Unsloth](https://github.com/unslothai/unsloth). + * Fused operation kernels + * Kernels for common model architectures (e.g., cross-entropy losses, RoPE embeddings and RMS norms). - [megablocks](https://github.com/databricks/megablocks). -- [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ). + * acceleration package for distributing mixture-of-experts that improves upon FSDP sharding. -`acceleration.yaml` -``` +In this section we demonstrate how to implement an `AutoGPTQPlugin` that implements an accelerate PEFT training mode with 4 bit GPTQ base weights. + +This is an `acceleration.yaml` +```yaml quantization: - requires_quantization: True - quant_num_bits: 4 - - quantize_cache: '/home/user/' - quant_kernel: 'gptq-tritonv2' - - unsloth: False ``` -``` -from functools import partial +```python -class Framework: +# Acceleration Plugin for AutoGPTQ acceleration with kernels +class AutoGPTQPlugin(TuningAccelerationPlugin): def __init__(self, acceleration_config_path:str) -> None: - self.acceleration_config = self.read_acceleration_config(acceleration_config_path) - self.num_bits = self.acceleration_config.quant_num_bits - self.requires_custom_loading = self.acceleration_config.requires_custom_loading - self.requires_quantization = self.acceleration_config.requires_quantization - self.quantize_cache = self.acceleration_config['quantize_cache'] - self.kernel = self.acceleration_config['quant_kernel'] - - def read_acceleration_config(self, acceleration_config_path): - pass + # ... initialize config - def callbacks(self, *args, **kwargs): - pass - - def model_loader(self): + def model_loader(self, model_path, **kwargs): + + # ... maybe quantize if needed quantize_config = QuantizeConfig( bits=self.num_bits, ) - if self.requires_quantization: - return partial(AutoGPTQForCausalLM.from_pretrained, quantize_config = quantize_config) - else: - return partial(AutoGPTQForCausalLM.from_quantized, quantize_config = quantize_config) + return AutoGPTQForCausalLM.from_quantized(model_path, quantize_config = quantize_config) def augmentation( self, @@ -251,29 +277,12 @@ class Framework: train_args, peft_config, ): - ''' - This function is used for any augmentation of the model before trainings - e.g. quantization, unsloth/PEFT installation and also MegaBlocks patching - ''' - - if self.requires_quantization: - model.quantize() - model.save_quantized(save_dir = self.quantize_cache) - - if peft_config: - # PEFT Installation - if 'gptq' in self.kernel: - from auto_gptq.utils.peft_utils import get_gptq_peft_model - model = get_gptq_peft_model( - model, - peft_config = peft_config, - ) - else: - from peft import get_peft_model - model = get_peft_model( - model, - peft_config - ) - - return model -``` \ No newline at end of file + assert peft_config is not None, "need peft_config to install PEFT adapters" + + # PEFT Installation + from auto_gptq.utils.peft_utils import get_gptq_peft_model + return get_gptq_peft_model( + model, + peft_config = peft_config, + ) +``` diff --git a/architecture_records/imgs/002-framework.png b/architecture_records/imgs/002-framework.png index 0010185c1..52ec3071c 100644 Binary files a/architecture_records/imgs/002-framework.png and b/architecture_records/imgs/002-framework.png differ