Skip to content

Commit

Permalink
docs: build and run image with configs
Browse files Browse the repository at this point in the history
Signed-off-by: Anh-Uong <[email protected]>
  • Loading branch information
anhuong committed Mar 26, 2024
1 parent 87ac5ac commit e29c14e
Showing 1 changed file with 164 additions and 0 deletions.
164 changes: 164 additions & 0 deletions build/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Building fms-hf-tuning as an Image

The Dockerfile provides a way of running fms-hf-tuning SFT Trainer. It installs the dependencies needed and adds two additional scripts that helps to parse arguments to pass to SFT Trainer. The `accelerate_launch.py` script is run by default when running the image to trigger SFT trainer for single or multi GPU by parsing arguments and running `accelerate launch launch_training.py`.

## Configuration

The scripts accept a JSON formatted config which are set by environment variables. `SFT_TRAINER_CONFIG_JSON_PATH` can be set to the mounted path of the JSON config. Alternatively, `SFT_TRAINER_CONFIG_JSON_ENV_VAR` can be set to the encoded JSON config using the below function:

```py
import base64

def encode_json(my_json_string):
base64_bytes = base64.b64encode(my_json_string.encode("ascii"))
txt = base64_bytes.decode("ascii")
return txt
```

The keys for the JSON config are all of the flags available to use with [SFT Trainer](https://huggingface.co/docs/trl/sft_trainer#trl.SFTTrainer).

For configuring `accelerate launch`, use key `multiGPU` and pass the set of flags accepted by [accelerate launch](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch).

For example, the below config is used for running with two GPUs and FSDP for PEFT tuning:

Note that `num_processes` which is the total number of processes to be launched in parallel,should match the number of GPUs to run on.

```json
{
"multiGPU": {
"num_machines": 1,
"main_process_port": 1234,
"num_processes": 2,
"use_fsdp": true,
"fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP",
"fsdp_sharding_strategy": 1,
"fsdp_state_dict_type": "FULL_STATE_DICT",
"fsdp_cpu_ram_efficient_loading": true,
"fsdp_sync_module_states": true
},
"model_name_or_path": "/llama/7B",
"training_data_path": "/data/twitter_complaints.json",
"output_dir": "/output/llama-7b-pt-multigpu",
"num_train_epochs": 5.0,
"per_device_train_batch_size": 4,
"per_device_eval_batch_size": 4,
"gradient_accumulation_steps": 4,
"save_strategy": "epoch",
"learning_rate": 0.03,
"weight_decay": 0.0,
"lr_scheduler_type": "cosine",
"logging_steps": 1.0,
"packing": false,
"include_tokens_per_second": true,
"response_template": "\n### Label:",
"dataset_text_field": "output",
"use_flash_attn": false,
"torch_dtype": "bfloat16",
"tokenizer_name_or_path": "/llama/7B"
}
```

When `multiGPU` is set, the [FSDP config](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/fixtures/accelerate_fsdp_defaults.yaml) is used by default. Any of these values can be overwritten by passing in flags via the JSON config or by passing in your own config file using key `config_file`.

If `multiGPU` is set and `num_processes` is not explicitly set, the number of processes/GPUs will be determined by the number of GPUs available via `torch.cuda.device_count()`.

If `multiGPU` is not set, the script will assume single-GPU and run with `num_processes=1`. The number of GPUs used can also be set by setting environment variable `CUDA_VISIBLE_DEVICES`.


## Building the Image

With docker, build the image with:

```sh
docker build . -t sft-trainer:mytag
```

## Running the Image

Run sft-trainer-image with the JSON env var and mounts set up.

```sh
docker run -v config.json:/app/config.json -v $MODEL_PATH:/model -v $TRAINING_DATA_PATH:/data/twitter_complaints.json --env SFT_TRAINER_CONFIG_JSON_PATH=/app/config.json sft-trainer:mytag
```

This will run `accelerate_launch.py` with the JSON config passed.

An example Kubernetes Pod for deploying sft-trainer which requires creating PVCs with the model and input dataset and any mounts needed for the outputted tuned model:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: sft-trainer-config
data:
config.json: |
{
"multiGPU": {
"num_machines": 1,
"main_process_port": 1234,
"num_processes": 2,
"use_fsdp": true,
"fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP",
"fsdp_sharding_strategy": 1,
"fsdp_state_dict_type": "FULL_STATE_DICT",
"fsdp_cpu_ram_efficient_loading": true,
"fsdp_sync_module_states": true
},
"model_name_or_path": "/llama/7B",
"training_data_path": "/data/twitter_complaints.json",
"output_dir": "/output/llama-7b-pt-multigpu",
"num_train_epochs": 5.0,
"per_device_train_batch_size": 4,
"per_device_eval_batch_size": 4,
"gradient_accumulation_steps": 4,
"save_strategy": "epoch",
"learning_rate": 0.03,
"weight_decay": 0.0,
"lr_scheduler_type": "cosine",
"logging_steps": 1.0,
"packing": false,
"include_tokens_per_second": true,
"response_template": "\n### Label:",
"dataset_text_field": "output",
"use_flash_attn": false,
"torch_dtype": "bfloat16",
"tokenizer_name_or_path": "/llama/7B"
}
---
apiVersion: v1
kind: Pod
metadata:
name: sft-trainer-test
spec:
containers:
env:
- name: SFT_TRAINER_CONFIG_JSON_PATH
value: /config/config.json
image: sft-trainer:mytag
imagePullPolicy: IfNotPresent
name: tuning-test
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
volumeMounts:
- mountPath: /data/input
name: input-data
- mountPath: /data/output
name: output-data
- mountPath: /config
name: sft-trainer-config
restartPolicy: Never
terminationGracePeriodSeconds: 30
volumes:
- name: input-data
persistentVolumeClaim:
claimName: input-pvc
- name: output-data
persistentVolumeClaim:
claimName: output-pvc
- name: sft-trainer-config
configMap:
name: sft-trainer-config
```

0 comments on commit e29c14e

Please sign in to comment.