feat: update launch training with accelerate for multi-gpu (#98)

* add accelerate launch script Signed-off-by: Anh-Uong <[email protected]> * give ownership of fms-hf-tuning repo to tuning user Signed-off-by: Anh-Uong <[email protected]> * fix: training script param Signed-off-by: Anh-Uong <[email protected]> * format script, add logging, add fsdp defaults file Signed-off-by: Anh-Uong <[email protected]> * set default accelerate config and set num_processes if multi-gpu Signed-off-by: Anh-Uong <[email protected]> * refactor copy and chmod Signed-off-by: Anh-Uong <[email protected]> * run accelerate script by default, run fmt Signed-off-by: Anh-Uong <[email protected]> * allow for multiGPU to be empty, run lint Signed-off-by: Anh-Uong <[email protected]> * explicitly set single GPU Signed-off-by: Anh-Uong <[email protected]> * docs: build and run image with configs Signed-off-by: Anh-Uong <[email protected]> * fix building dockerfile Signed-off-by: Anh-Uong <[email protected]> * fixes based on review comments - determine list of store action params - only override num_processes if no config_file - update json key multiGPU to accelerate_launch_args - update docs Signed-off-by: Anh-Uong <[email protected]> * multiGPU interpreted outside of accelerate params Signed-off-by: Anh-Uong <[email protected]> * Add support for parsing more accelerate launch params (#1) * Add support for parsing more accelerate launch params Signed-off-by: Thara Palanivel <[email protected]> * Formatting Signed-off-by: Thara Palanivel <[email protected]> * Addressing review comments Signed-off-by: Thara Palanivel <[email protected]> --------- Signed-off-by: Thara Palanivel <[email protected]> Signed-off-by: Anh-Uong <[email protected]> * fix logic for addt param parsing, docs Signed-off-by: Anh-Uong <[email protected]> * Use config only if multi_gpu (#2) * Use fsdp config only if multi_gpu Signed-off-by: Thara Palanivel <[email protected]> * Simplifying multi-gpu logic Signed-off-by: Thara Palanivel <[email protected]> * Fixing typo Signed-off-by: Thara Palanivel <[email protected]> * Address review comments Signed-off-by: Thara Palanivel <[email protected]> * Fix typo Signed-off-by: Thara Palanivel <[email protected]> --------- Signed-off-by: Thara Palanivel <[email protected]> Signed-off-by: Anh-Uong <[email protected]> * doc and comment updates from feedback Signed-off-by: Anh-Uong <[email protected]> --------- Signed-off-by: Anh-Uong <[email protected]> Signed-off-by: Thara Palanivel <[email protected]> Co-authored-by: tharapalanivel <[email protected]>
foundation-model-stack · Apr 2, 2024 · 2df20ba · 2df20ba
1 parent 79b0fd3
commit 2df20ba
Show file tree

Hide file tree

Showing 4 changed files with 290 additions and 7 deletions.
diff --git a/build/Dockerfile b/build/Dockerfile
@@ -109,8 +109,11 @@ RUN git clone https://github.com/foundation-model-stack/fms-hf-tuning.git && \
 RUN mkdir -p /licenses
 COPY LICENSE /licenses/
 
-COPY launch_training.py /app
-RUN chmod +x /app/launch_training.py
+# Copy scripts and default configs
+COPY build/launch_training.py build/accelerate_launch.py fixtures/accelerate_fsdp_defaults.yaml /app/
+RUN chmod +x /app/launch_training.py /app/accelerate_launch.py
+
+ENV FSDP_DEFAULTS_FILE_PATH="/app/accelerate_fsdp_defaults.yaml"
 
 # Need a better way to address this hack
 RUN touch /.aim_profile && \
@@ -120,10 +123,10 @@ RUN touch /.aim_profile && \
 
 # create tuning user and give ownership to dirs
 RUN useradd -u $USER_UID tuning -m -g 0 --system && \
-    chown -R $USER:0 /app && \
-    chmod -R g+rwX /app
+    chown -R $USER:0 /app /tmp && \
+    chmod -R g+rwX /app /tmp
 
 WORKDIR /app
 USER ${USER}
 
-CMD [ "tail", "-f", "/dev/null" ]
+CMD [ "python", "/app/accelerate_launch.py" ]
diff --git a/build/README.md b/build/README.md
@@ -0,0 +1,165 @@
+# Building fms-hf-tuning as an Image
+
+The Dockerfile provides a way of running fms-hf-tuning SFT Trainer. It installs the dependencies needed and adds two additional scripts that helps to parse arguments to pass to SFT Trainer. The `accelerate_launch.py` script is run by default when running the image to trigger SFT trainer for single or multi GPU by parsing arguments and running `accelerate launch launch_training.py`. 
+
+## Configuration
+
+The scripts accept a JSON formatted config which are set by environment variables. `SFT_TRAINER_CONFIG_JSON_PATH` can be set to the mounted path of the JSON config. Alternatively, `SFT_TRAINER_CONFIG_JSON_ENV_VAR` can be set to the encoded JSON config using the below function:
+
+```py
+import base64
+
+def encode_json(my_json_string):
+    base64_bytes = base64.b64encode(my_json_string.encode("ascii"))
+    txt = base64_bytes.decode("ascii")
+    return txt
+
+with open("test_config.json") as f:
+    contents = f.read()
+
+encode_json(contents)
+```
+
+The keys for the JSON config are all of the flags available to use with [SFT Trainer](https://huggingface.co/docs/trl/sft_trainer#trl.SFTTrainer).
+
+For configuring `accelerate launch`, use key `accelerate_launch_args` and pass the set of flags accepted by [accelerate launch](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch). Since these flags are passed via the JSON config, the key matches the long formed flag name. For example, to enable flag `--quiet`, use JSON key `"quiet"`, using the short formed `"q"` will fail.
+
+For example, the below config is used for running with two GPUs and FSDP for fine tuning:
+
+```json
+{
+    "accelerate_launch_args": {
+        "num_machines": 1,
+        "main_process_port": 1234,
+        "num_processes": 2,
+        "use_fsdp": true,
+        "fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP",
+        "fsdp_sharding_strategy": 1,
+        "fsdp_state_dict_type": "FULL_STATE_DICT",
+        "fsdp_cpu_ram_efficient_loading": true,
+        "fsdp_sync_module_states": true
+    },
+    "model_name_or_path": "/llama/13B",
+    "training_data_path": "/data/twitter_complaints.json",
+    "output_dir": "/output/llama-7b-pt-multigpu",
+    "num_train_epochs": 5.0,
+    "per_device_train_batch_size": 4,
+    "per_device_eval_batch_size": 4,
+    "gradient_accumulation_steps": 4,
+    "save_strategy": "epoch",
+    "learning_rate": 0.03,
+    "weight_decay": 0.0,
+    "lr_scheduler_type": "cosine",
+    "logging_steps": 1.0,
+    "packing": false,
+    "include_tokens_per_second": true,
+    "response_template": "\n### Label:",
+    "dataset_text_field": "output",
+    "use_flash_attn": true,
+    "torch_dtype": "bfloat16",
+    "tokenizer_name_or_path": "/llama/13B"
+}
+```
+
+Users should always set `num_processes` to be explicit about the number of processes to run tuning on. When `num_processes` is greater than 1, the [FSDP config](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/fixtures/accelerate_fsdp_defaults.yaml) is used by default. You can also set your own default values by specifying your own config file using key `config_file`. Any of these values in configs can be overwritten by passing in flags via `accelerate_launch_args` in the JSON config.
+
+Note that `num_processes` which is the total number of processes to be launched in parallel, should match the number of GPUs to run on. The number of GPUs used can also be set by setting environment variable `CUDA_VISIBLE_DEVICES`. If ``num_processes=1`, the script will assume single-GPU.
+
+
+## Building the Image
+
+With docker, build the image at the top level with:
+
+```sh
+docker build . -t sft-trainer:mytag -f build/Dockerfile
+```
+
+## Running the Image
+
+Run sft-trainer-image with the JSON env var and mounts set up.
+
+```sh
+docker run -v config.json:/app/config.json -v $MODEL_PATH:/model -v $TRAINING_DATA_PATH:/data/twitter_complaints.json --env SFT_TRAINER_CONFIG_JSON_PATH=/app/config.json sft-trainer:mytag
+```
+
+This will run `accelerate_launch.py` with the JSON config passed.
+
+An example Kubernetes Pod for deploying sft-trainer which requires creating PVCs with the model and input dataset and any mounts needed for the outputted tuned model:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+name: sft-trainer-config
+data:
+config.json: |
+    {
+        "accelerate_launch_args": {
+            "num_machines": 1,
+            "main_process_port": 1234,
+            "num_processes": 2,
+            "use_fsdp": true,
+            "fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP",
+            "fsdp_sharding_strategy": 1,
+            "fsdp_state_dict_type": "FULL_STATE_DICT",
+            "fsdp_cpu_ram_efficient_loading": true,
+            "fsdp_sync_module_states": true
+        },
+        "model_name_or_path": "/llama/13B",
+        "training_data_path": "/data/twitter_complaints.json",
+        "output_dir": "/output/llama-7b-pt-multigpu",
+        "num_train_epochs": 5.0,
+        "per_device_train_batch_size": 4,
+        "per_device_eval_batch_size": 4,
+        "gradient_accumulation_steps": 4,
+        "save_strategy": "epoch",
+        "learning_rate": 0.03,
+        "weight_decay": 0.0,
+        "lr_scheduler_type": "cosine",
+        "logging_steps": 1.0,
+        "packing": false,
+        "include_tokens_per_second": true,
+        "response_template": "\n### Label:",
+        "dataset_text_field": "output",
+        "use_flash_attn": true,
+        "torch_dtype": "bfloat16",
+        "tokenizer_name_or_path": "/llama/13B"
+    }
+---
+apiVersion: v1
+kind: Pod
+metadata:
+name: sft-trainer-test
+spec:
+containers:
+    env:
+        - name: SFT_TRAINER_CONFIG_JSON_PATH
+        value: /config/config.json
+    image: sft-trainer:mytag
+    imagePullPolicy: IfNotPresent
+    name: tuning-test
+    resources:
+        limits:
+            nvidia.com/gpu: "2"
+        requests:
+            nvidia.com/gpu: "2"
+    volumeMounts:
+        - mountPath: /data/input
+        name: input-data
+        - mountPath: /data/output
+        name: output-data
+        - mountPath: /config
+        name: sft-trainer-config
+restartPolicy: Never
+terminationGracePeriodSeconds: 30
+volumes:
+    - name: input-data
+    persistentVolumeClaim:
+        claimName: input-pvc
+    - name: output-data
+    persistentVolumeClaim:
+        claimName: output-pvc
+    - name: sft-trainer-config
+    configMap:
+        name: sft-trainer-config
+```
diff --git a/build/accelerate_launch.py b/build/accelerate_launch.py
@@ -0,0 +1,114 @@
+# Copyright The FMS HF Tuning Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Script wraps launch_training to run with accelerate for multi and single GPU cases.
+Read accelerate_launch_args configuration via environment variable `SFT_TRAINER_CONFIG_JSON_PATH`
+for the path to the JSON config file with parameters or `SFT_TRAINER_CONFIG_JSON_ENV_VAR`
+for the encoded config string to parse.
+"""
+
+# Standard
+import json
+import os
+import base64
+import pickle
+import logging
+
+# Third Party
+from accelerate.commands.launch import launch_command_parser, launch_command
+
+
+def txt_to_obj(txt):
+    base64_bytes = txt.encode("ascii")
+    message_bytes = base64.b64decode(base64_bytes)
+    try:
+        # If the bytes represent JSON string
+        return json.loads(message_bytes)
+    except UnicodeDecodeError:
+        # Otherwise the bytes are a pickled python dictionary
+        return pickle.loads(message_bytes)
+
+
+def main():
+    LOGLEVEL = os.environ.get("LOG_LEVEL", "WARNING").upper()
+    logging.basicConfig(level=LOGLEVEL)
+
+    json_configs = {}
+    json_path = os.getenv("SFT_TRAINER_CONFIG_JSON_PATH")
+    json_env_var = os.getenv("SFT_TRAINER_CONFIG_JSON_ENV_VAR")
+
+    if json_path:
+        with open(json_path, "r", encoding="utf-8") as f:
+            json_configs = json.load(f)
+
+    elif json_env_var:
+        json_configs = txt_to_obj(json_env_var)
+
+    parser = launch_command_parser()
+    # Map to determine which flags don't require a value to be set
+    actions_type_map = {
+        action.dest: type(action).__name__ for action in parser._actions
+    }
+
+    # Parse accelerate_launch_args
+    accelerate_launch_args = []
+    accelerate_config = json_configs.get("accelerate_launch_args", {})
+    if accelerate_config:
+        logging.info("Using accelerate_launch_args configs: %s", accelerate_config)
+        for key, val in accelerate_config.items():
+            if actions_type_map.get(key) == "_AppendAction":
+                for param_val in val:
+                    accelerate_launch_args.extend([f"--{key}", str(param_val)])
+            elif (actions_type_map.get(key) == "_StoreTrueAction" and val) or (
+                actions_type_map.get(key) == "_StoreFalseAction" and not val
+            ):
+                accelerate_launch_args.append(f"--{key}")
+            else:
+                accelerate_launch_args.append(f"--{key}")
+                # Only need to add key for params that aren't flags ie. --quiet
+                if actions_type_map.get(key) == "_StoreAction":
+                    accelerate_launch_args.append(str(val))
+
+    num_processes = accelerate_config.get("num_processes")
+    if num_processes:
+        # if multi GPU setting and accelerate config_file not passed by user,
+        # use the default config for default set of parameters
+        if num_processes > 1 and not accelerate_config.get("config_file"):
+            # Add default FSDP config
+            fsdp_filepath = os.getenv(
+                "FSDP_DEFAULTS_FILE_PATH", "/app/accelerate_fsdp_defaults.yaml"
+            )
+            if os.path.exists(fsdp_filepath):
+                logging.info("Using accelerate config file: %s", fsdp_filepath)
+                accelerate_launch_args.extend(["--config_file", fsdp_filepath])
+
+        elif num_processes == 1:
+            logging.info("num_processes=1 so setting env var CUDA_VISIBLE_DEVICES=0")
+            os.environ["CUDA_VISIBLE_DEVICES"] = "0"
+    else:
+        logging.warning(
+            "num_processes param was not passed in. Value from config file (if available) will \
+                be used or accelerate launch will determine number of processes automatically"
+        )
+
+    # Add training_script
+    accelerate_launch_args.append("/app/launch_training.py")
+
+    logging.debug("accelerate_launch_args: %s", accelerate_launch_args)
+    args = parser.parse_args(args=accelerate_launch_args)
+    logging.debug("accelerate launch parsed args: %s", args)
+    launch_command(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/build/launch_training.py b/build/launch_training.py
@@ -66,7 +66,8 @@ def main():
     LOGLEVEL = os.environ.get("LOG_LEVEL", "WARNING").upper()
     logging.basicConfig(level=LOGLEVEL)
 
-    logging.info("Attempting to launch training script")
+    logging.info("Initializing launch training script")
+
     parser = transformers.HfArgumentParser(
         dataclass_types=(
             configs.ModelArguments,
@@ -122,7 +123,7 @@ def main():
     elif peft_method_parsed == "pt":
         tune_config = prompt_tuning_config
 
-    logging.debug(
+    logging.info(
         "Parameters used to launch training: \
     model_args %s, data_args %s, training_args %s, tune_config %s",
         model_args,