Upstream Main: Linting, Benchmarking, HF QLoRA baseline, FSDP fixes f…

…or GPTQ-LoRA (#20) * Add GitHub Workflow for Linting , Formatting and Test. Activate Workflow for Framework (#7) * add lint workflow Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add pylintrc, update .tox fix files Signed-off-by: Yu Chin Fabian Lim <[email protected]> * activate test and minor fix Signed-off-by: Yu Chin Fabian Lim <[email protected]> * lint benchmarks.py and add workflow to dev Signed-off-by: Yu Chin Fabian Lim <[email protected]> --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Improvements to Benchmark Scripts and Config Generation Workflow (#13) * fix benches and add verify configs Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update readme and add workflow Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add packaging dep Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update torch dep in framework and run-benches Signed-off-by: Yu Chin Fabian Lim <[email protected]> * take host env in run-benches * add display bench results script * rename summary.csv to raw_summary.csv and update run_benchmarks.sh * export environment variables in shell command * dump out pip requirements for repro, and add default FHT_branch --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Added support for running official HF baseline FSDP-QLoRA benchmark (#16) * new baseline scenario * rename variables * added warning when plugin allows SFTTrainer to handle PEFT on single device * Fix FSDP when performing GPTQ-LoRA with Triton V2 (#15) * wrap in parameters and torch view to correct dtype Signed-off-by: Yu Chin Fabian Lim <[email protected]> * refactor to apply patch only on FSDP and simplify Signed-off-by: Yu Chin Fabian Lim <[email protected]> --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Provide Memory Benchmarking Feature to Benchmarking Code (#14) * add gpu memory logging support * made improvements to GPU reference and result collation * Renamed memory logging argument to reflect its readings as reserved me mory using nvidia-smi and changed aggregation function in result collation * variable renames * manual linting * added memory logging functionality via HFTrainer * added support to benchmark memory using HFTrainer and updated READMEwith explanation of the 2 memory benchmarking options * addressed changes requested in PR #14 * fix bug and smplify gpu logs aggregation logic * fixes to calculation of HFTrainer Mem Logging values * fix calculations * more fixes * fix to ignore including stage inside max calculation of alloc memory * more comments and README updates * added fix to keyerror due to empty output dict from OOM * manual linting * added benchmark results to refs * remove unnecessary columns in results gathering * made changes to results gathering --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: achew010 <[email protected]>
foundation-model-stack · May 27, 2024 · e8e06c9 · e8e06c9
1 parent a66432b
commit e8e06c9
Show file tree

Hide file tree

Showing 28 changed files with 1,648 additions and 332 deletions.
diff --git a/.github/workflows/format.yml b/.github/workflows/format.yml
@@ -0,0 +1,69 @@
+# Copyright The FMS HF Tuning Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: Format
+
+on:
+  push:
+    branches: [ "main", "dev" ]
+  pull_request:
+    branches: [ "main", "dev" ]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        plugin_name:
+          - "framework"
+          # - "accelerated-peft" # enable later
+
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python 3.9
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.9
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          python -m pip install tox
+      - name: Run linter
+        run: |
+          cd plugins/${{ matrix.plugin_name }}
+          tox -e lint
+      - name: Run formatter
+        run: |
+          cd plugins/${{ matrix.plugin_name }}
+          tox -e fmt
+      - name: Run pytest
+        run: |
+          cd plugins/${{ matrix.plugin_name }}
+          tox -e py
+
+  sample-config:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python 3.9
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.9
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          python -m pip install tox
+      - name: Run Config Verification
+        run: tox -e verify-configs
diff --git a/plugins/accelerated-peft/configs/bnb.yaml b/plugins/accelerated-peft/configs/bnb.yaml
@@ -14,3 +14,7 @@ peft:
     # bitsandbytes:
     bitsandbytes:
       quant_type: nf4 
+
+      # If True, then no get_peft_model and prepare_model_for_kbit_training
+      # will be called. 
+      no_peft_model: False
diff --git a/plugins/accelerated-peft/src/fms_acceleration_peft/autogptq_utils.py b/plugins/accelerated-peft/src/fms_acceleration_peft/autogptq_utils.py
@@ -18,7 +18,7 @@
 # Third Party
 from peft import LoraConfig
 from peft.tuners.lora.gptq import QuantLinear as LoraLinearGPTQ
-from transformers.utils.import_utils import _is_package_available
+from typing import List, Callable
 import torch
 
 
@@ -54,3 +54,32 @@ def create_new_module_peft(
 
     # if module cannot be found, return None which results in a raise in the call-stack
     return new_module
+
+# consider to move this somewhere more general
+def patch_forward_to_view_attributes_before_call(
+    old_forward: Callable,
+    attribute_names: List[str], torch_dtype,
+):
+    # patch old_forward to view attribtues to torch_dype
+    # before call
+
+    def _forward(self, *args, **kwargs):
+        # perform a view on all these attributes
+        for attr_name in attribute_names:
+
+            # the view should be a passthrough 
+            # if attr.dtype == torch_dtype
+            attr = getattr(self, attr_name)
+
+            # perform view
+            attr = attr.view(torch_dtype)
+
+            try:
+                setattr(self, attr_name, attr)
+            except TypeError:
+                # this means already have attr_name as a parameter, then
+                # just assign this way
+                self.__dict__[attr_name] = attr
+
+        return old_forward(*args, **kwargs)
+    return _forward
diff --git a/plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py b/plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
@@ -25,8 +25,10 @@
 from fms_acceleration import AccelerationPlugin
 from peft import LoraConfig, prepare_model_for_kbit_training
 from peft.tuners.lora.model import LoraModel
+import torch.distributed
 from transformers import AutoModelForCausalLM, TrainingArguments
 import torch
+import os
 
 
 class AutoGPTQAccelerationPlugin(AccelerationPlugin):
@@ -50,6 +52,8 @@ def model_loader(self, model_name: str, **kwargs):
         # guarded imports
         # Third Party
         from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+        from auto_gptq.nn_modules.qlinear.qlinear_tritonv2 import QuantLinear, QuantLinearFunction
+        from .autogptq_utils import patch_forward_to_view_attributes_before_call
 
         # Currently we allow only a quantized checkpoint to be loaded, we do not
         # implement the quantization process here.
@@ -121,6 +125,43 @@ def model_loader(self, model_name: str, **kwargs):
             device_map=device_map,
         )
 
+        # https://github.com/foundation-model-stack/fms-acceleration/pull/15
+        # if FSDP distributed need to convert the AutoGPTQ model's 
+        # parameters (in tensors) to parameters. Also need to
+        # store the int32 tensors in a float type
+
+        try:
+            world_size = torch.distributed.get_world_size()
+        except ValueError:
+            world_size = 1  # pg not init
+
+        if (
+            world_size > 1
+            and os.environ.get("ACCELERATE_USE_FSDP", "false").lower() == "true"
+        ):
+            # these parameters are to be patched for triton v2
+            # consider making a map if patching more kernels
+            PATCH_FOR_FSDP_TRITON_V2 = ['qweight', 'qzeros']
+
+            # patch all the QuantLinear base layers
+            for mod in model.modules():
+                if isinstance(mod, QuantLinear):
+
+                    # convert all patched attributes to Parameters of torch_dtype
+                    # so FSDP can shard them
+                    for attr_name in PATCH_FOR_FSDP_TRITON_V2:
+                        attr = getattr(mod, attr_name)
+                        attr = torch.nn.Parameter(attr.view(torch_dtype), requires_grad=False)
+                        setattr(mod, attr_name, attr)
+
+                    # this patches the forward to convert them back to original 
+                    # type (i.e. int32) before the function call into the kernels
+                    _forward = patch_forward_to_view_attributes_before_call(
+                        mod.forward, attribute_names=PATCH_FOR_FSDP_TRITON_V2,
+                        torch_dtype=torch.int32, # patch it back to 
+                    )
+                    mod.forward = MethodType(_forward, mod)
+
         # replace
         AutoModelForCausalLM.from_config = _old_from_config
 

diff --git a/plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_bnb.py b/plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_bnb.py
@@ -96,6 +96,9 @@ def __init__(self, configurations: Dict[str, Dict]):
         self._quant_type = self._check_config_and_maybe_check_values(
             key="peft.quantization.bitsandbytes.quant_type", values=["fp4", "nf4"]
         )
+        self._no_peft_model = self._check_config_and_maybe_check_values(
+            key="peft.quantization.bitsandbytes.no_peft_model", values=[True, False]
+        )
 
     def model_loader(self, model_name: str, **kwargs):
 
@@ -121,6 +124,16 @@ def model_loader(self, model_name: str, **kwargs):
                 "If running in FSDP, this is probably because accelerate is not used. "
                 "This will most probably result in error."
             )
+        elif (
+            world_size == 1
+            and self._no_peft_model == True
+        ):
+            warnings.warn(
+                """Running on single device and setting plugin config `no_peft_model` as `True`
+                PEFT preparation will be managed by SFTTrainer and will cause a slowdown in training speed 
+                due to extraneous dtype casting when SFTTrainer prepares the model using
+                https://github.com/huggingface/trl/blob/e90e8d91d2265e484f229c45a5eb8982f94a2936/trl/trainer/sft_trainer.py#L210"""
+            )            
 
         bnb_config = BitsAndBytesConfig(
             load_in_4bit=True,
@@ -147,7 +160,8 @@ def requires_custom_loading(self):
 
     @property
     def requires_agumentation(self):
-        return True
+        # will skip the augmentation if _no_peft_model == True
+        return not self._no_peft_model
 
     def augmentation(
         self,

diff --git a/plugins/accelerated-peft/tests/test_peft_plugins.py b/plugins/accelerated-peft/tests/test_peft_plugins.py
@@ -122,6 +122,20 @@ def test_configure_bnb_plugin():
             assert framework.requires_agumentation
             assert len(framework.get_callbacks_and_ready_for_train()) == 0
 
+    # test no_peft_model is true skips plugin.augmentation
+    for key, correct_value in [
+        ("peft.quantization.bitsandbytes.no_peft_model", True),
+        ("peft.quantization.bitsandbytes.no_peft_model", False),
+    ]:
+        with instantiate_framework(
+            update_configuration_contents(
+                read_configuration(CONFIG_PATH_BNB), key, correct_value
+            ),
+            require_packages_check=False,
+        ):
+            # check flags and callbacks
+            assert (not correct_value)==framework.requires_agumentation
+
     # attempt to activate plugin with configuration pointing to wrong path
     # - raise with message that no plugins can be configured
     with pytest.raises(ValueError) as e:

diff --git a/plugins/accelerated-peft/tox.ini b/plugins/accelerated-peft/tox.ini
@@ -18,6 +18,13 @@ commands =
 
 [testenv:lint]
 description = run linters
+deps =
+    pylint>=2.16.2,<=3.1.0
+commands = pylint src tests
+allowlist_externals = pylint
+
+[testenv:fmt]
+description = format 
 skip_install = true
 deps =
     black>=22.12
@@ -26,6 +33,7 @@ commands =
     black {posargs:.}
     isort {posargs:.}
 
+
 # [testenv:build]
 # description = build wheel
 # deps =