Skip to content

Commit

Permalink
Upstream Main: Linting, Benchmarking, HF QLoRA baseline, FSDP fixes f…
Browse files Browse the repository at this point in the history
…or GPTQ-LoRA (#20)

* Add GitHub Workflow for Linting , Formatting and Test. Activate Workflow for Framework (#7)

* add lint workflow

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add pylintrc, update .tox fix files

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* activate test and minor fix

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* lint benchmarks.py and add workflow to dev

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* Improvements to Benchmark Scripts and Config Generation Workflow (#13)

* fix benches and add verify configs

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update readme and add workflow

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add packaging dep

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update torch dep in framework and run-benches

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* take host env in run-benches

* add display bench results script

* rename summary.csv to raw_summary.csv and update run_benchmarks.sh

* export environment variables in shell command

* dump out pip requirements for repro, and add default FHT_branch

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* Added support for running official HF baseline FSDP-QLoRA benchmark (#16)

* new baseline scenario

* rename variables

* added warning when plugin allows SFTTrainer to handle PEFT on single device

* Fix FSDP when performing GPTQ-LoRA with Triton V2  (#15)

* wrap in parameters and torch view to correct dtype

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* refactor to apply patch only on FSDP and simplify

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* Provide Memory Benchmarking Feature to Benchmarking Code (#14)

* add gpu memory logging support

* made improvements to GPU reference and result collation

* Renamed memory logging argument to reflect its readings as reserved me
mory using nvidia-smi and changed aggregation function in result collation

* variable renames

* manual linting

* added memory logging functionality via HFTrainer

* added support to benchmark memory using HFTrainer and updated READMEwith explanation of the 2 memory benchmarking options

* addressed changes requested in PR #14

* fix bug and smplify gpu logs aggregation logic

* fixes to calculation of HFTrainer Mem Logging values

* fix calculations

* more fixes

* fix to ignore including  stage inside max calculation of alloc memory

* more comments and README updates

* added fix to keyerror due to empty output dict from OOM

* manual linting

* added benchmark results to refs

* remove unnecessary columns in results gathering

* made changes to results gathering

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Co-authored-by: achew010 <[email protected]>
  • Loading branch information
fabianlim and achew010 authored May 27, 2024
1 parent a66432b commit e8e06c9
Show file tree
Hide file tree
Showing 28 changed files with 1,648 additions and 332 deletions.
69 changes: 69 additions & 0 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright The FMS HF Tuning Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Format

on:
push:
branches: [ "main", "dev" ]
pull_request:
branches: [ "main", "dev" ]

jobs:
lint:
runs-on: ubuntu-latest
strategy:
matrix:
plugin_name:
- "framework"
# - "accelerated-peft" # enable later

steps:
- uses: actions/checkout@v4
- name: Set up Python 3.9
uses: actions/setup-python@v4
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install tox
- name: Run linter
run: |
cd plugins/${{ matrix.plugin_name }}
tox -e lint
- name: Run formatter
run: |
cd plugins/${{ matrix.plugin_name }}
tox -e fmt
- name: Run pytest
run: |
cd plugins/${{ matrix.plugin_name }}
tox -e py
sample-config:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python 3.9
uses: actions/setup-python@v4
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install tox
- name: Run Config Verification
run: tox -e verify-configs
4 changes: 4 additions & 0 deletions plugins/accelerated-peft/configs/bnb.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,7 @@ peft:
# bitsandbytes:
bitsandbytes:
quant_type: nf4

# If True, then no get_peft_model and prepare_model_for_kbit_training
# will be called.
no_peft_model: False
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# Third Party
from peft import LoraConfig
from peft.tuners.lora.gptq import QuantLinear as LoraLinearGPTQ
from transformers.utils.import_utils import _is_package_available
from typing import List, Callable
import torch


Expand Down Expand Up @@ -54,3 +54,32 @@ def create_new_module_peft(

# if module cannot be found, return None which results in a raise in the call-stack
return new_module

# consider to move this somewhere more general
def patch_forward_to_view_attributes_before_call(
old_forward: Callable,
attribute_names: List[str], torch_dtype,
):
# patch old_forward to view attribtues to torch_dype
# before call

def _forward(self, *args, **kwargs):
# perform a view on all these attributes
for attr_name in attribute_names:

# the view should be a passthrough
# if attr.dtype == torch_dtype
attr = getattr(self, attr_name)

# perform view
attr = attr.view(torch_dtype)

try:
setattr(self, attr_name, attr)
except TypeError:
# this means already have attr_name as a parameter, then
# just assign this way
self.__dict__[attr_name] = attr

return old_forward(*args, **kwargs)
return _forward
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,10 @@
from fms_acceleration import AccelerationPlugin
from peft import LoraConfig, prepare_model_for_kbit_training
from peft.tuners.lora.model import LoraModel
import torch.distributed
from transformers import AutoModelForCausalLM, TrainingArguments
import torch
import os


class AutoGPTQAccelerationPlugin(AccelerationPlugin):
Expand All @@ -50,6 +52,8 @@ def model_loader(self, model_name: str, **kwargs):
# guarded imports
# Third Party
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
from auto_gptq.nn_modules.qlinear.qlinear_tritonv2 import QuantLinear, QuantLinearFunction
from .autogptq_utils import patch_forward_to_view_attributes_before_call

# Currently we allow only a quantized checkpoint to be loaded, we do not
# implement the quantization process here.
Expand Down Expand Up @@ -121,6 +125,43 @@ def model_loader(self, model_name: str, **kwargs):
device_map=device_map,
)

# https://github.com/foundation-model-stack/fms-acceleration/pull/15
# if FSDP distributed need to convert the AutoGPTQ model's
# parameters (in tensors) to parameters. Also need to
# store the int32 tensors in a float type

try:
world_size = torch.distributed.get_world_size()
except ValueError:
world_size = 1 # pg not init

if (
world_size > 1
and os.environ.get("ACCELERATE_USE_FSDP", "false").lower() == "true"
):
# these parameters are to be patched for triton v2
# consider making a map if patching more kernels
PATCH_FOR_FSDP_TRITON_V2 = ['qweight', 'qzeros']

# patch all the QuantLinear base layers
for mod in model.modules():
if isinstance(mod, QuantLinear):

# convert all patched attributes to Parameters of torch_dtype
# so FSDP can shard them
for attr_name in PATCH_FOR_FSDP_TRITON_V2:
attr = getattr(mod, attr_name)
attr = torch.nn.Parameter(attr.view(torch_dtype), requires_grad=False)
setattr(mod, attr_name, attr)

# this patches the forward to convert them back to original
# type (i.e. int32) before the function call into the kernels
_forward = patch_forward_to_view_attributes_before_call(
mod.forward, attribute_names=PATCH_FOR_FSDP_TRITON_V2,
torch_dtype=torch.int32, # patch it back to
)
mod.forward = MethodType(_forward, mod)

# replace
AutoModelForCausalLM.from_config = _old_from_config

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ def __init__(self, configurations: Dict[str, Dict]):
self._quant_type = self._check_config_and_maybe_check_values(
key="peft.quantization.bitsandbytes.quant_type", values=["fp4", "nf4"]
)
self._no_peft_model = self._check_config_and_maybe_check_values(
key="peft.quantization.bitsandbytes.no_peft_model", values=[True, False]
)

def model_loader(self, model_name: str, **kwargs):

Expand All @@ -121,6 +124,16 @@ def model_loader(self, model_name: str, **kwargs):
"If running in FSDP, this is probably because accelerate is not used. "
"This will most probably result in error."
)
elif (
world_size == 1
and self._no_peft_model == True
):
warnings.warn(
"""Running on single device and setting plugin config `no_peft_model` as `True`
PEFT preparation will be managed by SFTTrainer and will cause a slowdown in training speed
due to extraneous dtype casting when SFTTrainer prepares the model using
https://github.com/huggingface/trl/blob/e90e8d91d2265e484f229c45a5eb8982f94a2936/trl/trainer/sft_trainer.py#L210"""
)

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
Expand All @@ -147,7 +160,8 @@ def requires_custom_loading(self):

@property
def requires_agumentation(self):
return True
# will skip the augmentation if _no_peft_model == True
return not self._no_peft_model

def augmentation(
self,
Expand Down
14 changes: 14 additions & 0 deletions plugins/accelerated-peft/tests/test_peft_plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,20 @@ def test_configure_bnb_plugin():
assert framework.requires_agumentation
assert len(framework.get_callbacks_and_ready_for_train()) == 0

# test no_peft_model is true skips plugin.augmentation
for key, correct_value in [
("peft.quantization.bitsandbytes.no_peft_model", True),
("peft.quantization.bitsandbytes.no_peft_model", False),
]:
with instantiate_framework(
update_configuration_contents(
read_configuration(CONFIG_PATH_BNB), key, correct_value
),
require_packages_check=False,
):
# check flags and callbacks
assert (not correct_value)==framework.requires_agumentation

# attempt to activate plugin with configuration pointing to wrong path
# - raise with message that no plugins can be configured
with pytest.raises(ValueError) as e:
Expand Down
8 changes: 8 additions & 0 deletions plugins/accelerated-peft/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ commands =

[testenv:lint]
description = run linters
deps =
pylint>=2.16.2,<=3.1.0
commands = pylint src tests
allowlist_externals = pylint

[testenv:fmt]
description = format
skip_install = true
deps =
black>=22.12
Expand All @@ -26,6 +33,7 @@ commands =
black {posargs:.}
isort {posargs:.}


# [testenv:build]
# description = build wheel
# deps =
Expand Down
Loading

0 comments on commit e8e06c9

Please sign in to comment.