Merge branch 'main' into default_params

Signed-off-by: Thara Palanivel <[email protected]>
foundation-model-stack · Apr 2, 2024 · 0f806b9 · 0f806b9
2 parents 4cf6e5a + 2df20ba
commit 0f806b9
Show file tree

Hide file tree

Showing 3 changed files with 287 additions and 5 deletions.
diff --git a/build/Dockerfile b/build/Dockerfile
@@ -109,8 +109,11 @@ RUN git clone https://github.com/foundation-model-stack/fms-hf-tuning.git && \
 RUN mkdir -p /licenses
 COPY LICENSE /licenses/
 
-COPY launch_training.py /app
-RUN chmod +x /app/launch_training.py
+# Copy scripts and default configs
+COPY build/launch_training.py build/accelerate_launch.py fixtures/accelerate_fsdp_defaults.yaml /app/
+RUN chmod +x /app/launch_training.py /app/accelerate_launch.py
+
+ENV FSDP_DEFAULTS_FILE_PATH="/app/accelerate_fsdp_defaults.yaml"
 
 # Need a better way to address this hack
 RUN touch /.aim_profile && \
@@ -120,10 +123,10 @@ RUN touch /.aim_profile && \
 
 # create tuning user and give ownership to dirs
 RUN useradd -u $USER_UID tuning -m -g 0 --system && \
-    chown -R $USER:0 /app && \
-    chmod -R g+rwX /app
+    chown -R $USER:0 /app /tmp && \
+    chmod -R g+rwX /app /tmp
 
 WORKDIR /app
 USER ${USER}
 
-CMD [ "tail", "-f", "/dev/null" ]
+CMD [ "python", "/app/accelerate_launch.py" ]
diff --git a/build/README.md b/build/README.md
@@ -0,0 +1,165 @@
+# Building fms-hf-tuning as an Image
+
+The Dockerfile provides a way of running fms-hf-tuning SFT Trainer. It installs the dependencies needed and adds two additional scripts that helps to parse arguments to pass to SFT Trainer. The `accelerate_launch.py` script is run by default when running the image to trigger SFT trainer for single or multi GPU by parsing arguments and running `accelerate launch launch_training.py`. 
+
+## Configuration
+
+The scripts accept a JSON formatted config which are set by environment variables. `SFT_TRAINER_CONFIG_JSON_PATH` can be set to the mounted path of the JSON config. Alternatively, `SFT_TRAINER_CONFIG_JSON_ENV_VAR` can be set to the encoded JSON config using the below function:
+
+```py
+import base64
+
+def encode_json(my_json_string):
+    base64_bytes = base64.b64encode(my_json_string.encode("ascii"))
+    txt = base64_bytes.decode("ascii")
+    return txt
+
+with open("test_config.json") as f:
+    contents = f.read()
+
+encode_json(contents)
+```
+
+The keys for the JSON config are all of the flags available to use with [SFT Trainer](https://huggingface.co/docs/trl/sft_trainer#trl.SFTTrainer).
+
+For configuring `accelerate launch`, use key `accelerate_launch_args` and pass the set of flags accepted by [accelerate launch](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch). Since these flags are passed via the JSON config, the key matches the long formed flag name. For example, to enable flag `--quiet`, use JSON key `"quiet"`, using the short formed `"q"` will fail.
+
+For example, the below config is used for running with two GPUs and FSDP for fine tuning:
+
+```json
+{
+    "accelerate_launch_args": {
+        "num_machines": 1,
+        "main_process_port": 1234,
+        "num_processes": 2,
+        "use_fsdp": true,
+        "fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP",
+        "fsdp_sharding_strategy": 1,
+        "fsdp_state_dict_type": "FULL_STATE_DICT",
+        "fsdp_cpu_ram_efficient_loading": true,
+        "fsdp_sync_module_states": true
+    },
+    "model_name_or_path": "/llama/13B",
+    "training_data_path": "/data/twitter_complaints.json",
+    "output_dir": "/output/llama-7b-pt-multigpu",
+    "num_train_epochs": 5.0,
+    "per_device_train_batch_size": 4,
+    "per_device_eval_batch_size": 4,
+    "gradient_accumulation_steps": 4,
+    "save_strategy": "epoch",
+    "learning_rate": 0.03,
+    "weight_decay": 0.0,
+    "lr_scheduler_type": "cosine",
+    "logging_steps": 1.0,
+    "packing": false,
+    "include_tokens_per_second": true,
+    "response_template": "\n### Label:",
+    "dataset_text_field": "output",
+    "use_flash_attn": true,
+    "torch_dtype": "bfloat16",
+    "tokenizer_name_or_path": "/llama/13B"
+}
+```
+
+Users should always set `num_processes` to be explicit about the number of processes to run tuning on. When `num_processes` is greater than 1, the [FSDP config](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/fixtures/accelerate_fsdp_defaults.yaml) is used by default. You can also set your own default values by specifying your own config file using key `config_file`. Any of these values in configs can be overwritten by passing in flags via `accelerate_launch_args` in the JSON config.
+
+Note that `num_processes` which is the total number of processes to be launched in parallel, should match the number of GPUs to run on. The number of GPUs used can also be set by setting environment variable `CUDA_VISIBLE_DEVICES`. If ``num_processes=1`, the script will assume single-GPU.
+
+
+## Building the Image
+
+With docker, build the image at the top level with:
+
+```sh
+docker build . -t sft-trainer:mytag -f build/Dockerfile
+```
+
+## Running the Image
+
+Run sft-trainer-image with the JSON env var and mounts set up.
+
+```sh
+docker run -v config.json:/app/config.json -v $MODEL_PATH:/model -v $TRAINING_DATA_PATH:/data/twitter_complaints.json --env SFT_TRAINER_CONFIG_JSON_PATH=/app/config.json sft-trainer:mytag
+```
+
+This will run `accelerate_launch.py` with the JSON config passed.
+
+An example Kubernetes Pod for deploying sft-trainer which requires creating PVCs with the model and input dataset and any mounts needed for the outputted tuned model:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+name: sft-trainer-config
+data:
+config.json: |
+    {
+        "accelerate_launch_args": {
+            "num_machines": 1,
+            "main_process_port": 1234,
+            "num_processes": 2,
+            "use_fsdp": true,
+            "fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP",
+            "fsdp_sharding_strategy": 1,
+            "fsdp_state_dict_type": "FULL_STATE_DICT",
+            "fsdp_cpu_ram_efficient_loading": true,
+            "fsdp_sync_module_states": true
+        },
+        "model_name_or_path": "/llama/13B",
+        "training_data_path": "/data/twitter_complaints.json",
+        "output_dir": "/output/llama-7b-pt-multigpu",
+        "num_train_epochs": 5.0,
+        "per_device_train_batch_size": 4,
+        "per_device_eval_batch_size": 4,
+        "gradient_accumulation_steps": 4,
+        "save_strategy": "epoch",
+        "learning_rate": 0.03,
+        "weight_decay": 0.0,
+        "lr_scheduler_type": "cosine",
+        "logging_steps": 1.0,
+        "packing": false,
+        "include_tokens_per_second": true,
+        "response_template": "\n### Label:",
+        "dataset_text_field": "output",
+        "use_flash_attn": true,
+        "torch_dtype": "bfloat16",
+        "tokenizer_name_or_path": "/llama/13B"
+    }
+---
+apiVersion: v1
+kind: Pod
+metadata:
+name: sft-trainer-test
+spec:
+containers:
+    env:
+        - name: SFT_TRAINER_CONFIG_JSON_PATH
+        value: /config/config.json
+    image: sft-trainer:mytag
+    imagePullPolicy: IfNotPresent
+    name: tuning-test
+    resources:
+        limits:
+            nvidia.com/gpu: "2"
+        requests:
+            nvidia.com/gpu: "2"
+    volumeMounts:
+        - mountPath: /data/input
+        name: input-data
+        - mountPath: /data/output
+        name: output-data
+        - mountPath: /config
+        name: sft-trainer-config
+restartPolicy: Never
+terminationGracePeriodSeconds: 30
+volumes:
+    - name: input-data
+    persistentVolumeClaim:
+        claimName: input-pvc
+    - name: output-data
+    persistentVolumeClaim:
+        claimName: output-pvc
+    - name: sft-trainer-config
+    configMap:
+        name: sft-trainer-config
+```
diff --git a/build/accelerate_launch.py b/build/accelerate_launch.py
@@ -0,0 +1,114 @@
+# Copyright The FMS HF Tuning Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Script wraps launch_training to run with accelerate for multi and single GPU cases.
+Read accelerate_launch_args configuration via environment variable `SFT_TRAINER_CONFIG_JSON_PATH`
+for the path to the JSON config file with parameters or `SFT_TRAINER_CONFIG_JSON_ENV_VAR`
+for the encoded config string to parse.
+"""
+
+# Standard
+import json
+import os
+import base64
+import pickle
+import logging
+
+# Third Party
+from accelerate.commands.launch import launch_command_parser, launch_command
+
+
+def txt_to_obj(txt):
+    base64_bytes = txt.encode("ascii")
+    message_bytes = base64.b64decode(base64_bytes)
+    try:
+        # If the bytes represent JSON string
+        return json.loads(message_bytes)
+    except UnicodeDecodeError:
+        # Otherwise the bytes are a pickled python dictionary
+        return pickle.loads(message_bytes)
+
+
+def main():
+    LOGLEVEL = os.environ.get("LOG_LEVEL", "WARNING").upper()
+    logging.basicConfig(level=LOGLEVEL)
+
+    json_configs = {}
+    json_path = os.getenv("SFT_TRAINER_CONFIG_JSON_PATH")
+    json_env_var = os.getenv("SFT_TRAINER_CONFIG_JSON_ENV_VAR")
+
+    if json_path:
+        with open(json_path, "r", encoding="utf-8") as f:
+            json_configs = json.load(f)
+
+    elif json_env_var:
+        json_configs = txt_to_obj(json_env_var)
+
+    parser = launch_command_parser()
+    # Map to determine which flags don't require a value to be set
+    actions_type_map = {
+        action.dest: type(action).__name__ for action in parser._actions
+    }
+
+    # Parse accelerate_launch_args
+    accelerate_launch_args = []
+    accelerate_config = json_configs.get("accelerate_launch_args", {})
+    if accelerate_config:
+        logging.info("Using accelerate_launch_args configs: %s", accelerate_config)
+        for key, val in accelerate_config.items():
+            if actions_type_map.get(key) == "_AppendAction":
+                for param_val in val:
+                    accelerate_launch_args.extend([f"--{key}", str(param_val)])
+            elif (actions_type_map.get(key) == "_StoreTrueAction" and val) or (
+                actions_type_map.get(key) == "_StoreFalseAction" and not val
+            ):
+                accelerate_launch_args.append(f"--{key}")
+            else:
+                accelerate_launch_args.append(f"--{key}")
+                # Only need to add key for params that aren't flags ie. --quiet
+                if actions_type_map.get(key) == "_StoreAction":
+                    accelerate_launch_args.append(str(val))
+
+    num_processes = accelerate_config.get("num_processes")
+    if num_processes:
+        # if multi GPU setting and accelerate config_file not passed by user,
+        # use the default config for default set of parameters
+        if num_processes > 1 and not accelerate_config.get("config_file"):
+            # Add default FSDP config
+            fsdp_filepath = os.getenv(
+                "FSDP_DEFAULTS_FILE_PATH", "/app/accelerate_fsdp_defaults.yaml"
+            )
+            if os.path.exists(fsdp_filepath):
+                logging.info("Using accelerate config file: %s", fsdp_filepath)
+                accelerate_launch_args.extend(["--config_file", fsdp_filepath])
+
+        elif num_processes == 1:
+            logging.info("num_processes=1 so setting env var CUDA_VISIBLE_DEVICES=0")
+            os.environ["CUDA_VISIBLE_DEVICES"] = "0"
+    else:
+        logging.warning(
+            "num_processes param was not passed in. Value from config file (if available) will \
+                be used or accelerate launch will determine number of processes automatically"
+        )
+
+    # Add training_script
+    accelerate_launch_args.append("/app/launch_training.py")
+
+    logging.debug("accelerate_launch_args: %s", accelerate_launch_args)
+    args = parser.parse_args(args=accelerate_launch_args)
+    logging.debug("accelerate launch parsed args: %s", args)
+    launch_command(args)
+
+
+if __name__ == "__main__":
+    main()