diff --git a/build/README.md b/build/README.md new file mode 100644 index 000000000..f4f6643b7 --- /dev/null +++ b/build/README.md @@ -0,0 +1,164 @@ +# Building fms-hf-tuning as an Image + +The Dockerfile provides a way of running fms-hf-tuning SFT Trainer. It installs the dependencies needed and adds two additional scripts that helps to parse arguments to pass to SFT Trainer. The `accelerate_launch.py` script is run by default when running the image to trigger SFT trainer for single or multi GPU by parsing arguments and running `accelerate launch launch_training.py`. + +## Configuration + +The scripts accept a JSON formatted config which are set by environment variables. `SFT_TRAINER_CONFIG_JSON_PATH` can be set to the mounted path of the JSON config. Alternatively, `SFT_TRAINER_CONFIG_JSON_ENV_VAR` can be set to the encoded JSON config using the below function: + +```py +import base64 + + def encode_json(my_json_string): + base64_bytes = base64.b64encode(my_json_string.encode("ascii")) + txt = base64_bytes.decode("ascii") + return txt +``` + +The keys for the JSON config are all of the flags available to use with [SFT Trainer](https://huggingface.co/docs/trl/sft_trainer#trl.SFTTrainer). + +For configuring `accelerate launch`, use key `multiGPU` and pass the set of flags accepted by [accelerate launch](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch). + +For example, the below config is used for running with two GPUs and FSDP for PEFT tuning: + +Note that `num_processes` which is the total number of processes to be launched in parallel,should match the number of GPUs to run on. + +```json +{ + "multiGPU": { + "num_machines": 1, + "main_process_port": 1234, + "num_processes": 2, + "use_fsdp": true, + "fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP", + "fsdp_sharding_strategy": 1, + "fsdp_state_dict_type": "FULL_STATE_DICT", + "fsdp_cpu_ram_efficient_loading": true, + "fsdp_sync_module_states": true + }, + "model_name_or_path": "/llama/7B", + "training_data_path": "/data/twitter_complaints.json", + "output_dir": "/output/llama-7b-pt-multigpu", + "num_train_epochs": 5.0, + "per_device_train_batch_size": 4, + "per_device_eval_batch_size": 4, + "gradient_accumulation_steps": 4, + "save_strategy": "epoch", + "learning_rate": 0.03, + "weight_decay": 0.0, + "lr_scheduler_type": "cosine", + "logging_steps": 1.0, + "packing": false, + "include_tokens_per_second": true, + "response_template": "\n### Label:", + "dataset_text_field": "output", + "use_flash_attn": false, + "torch_dtype": "bfloat16", + "tokenizer_name_or_path": "/llama/7B" +} +``` + +When `multiGPU` is set, the [FSDP config](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/fixtures/accelerate_fsdp_defaults.yaml) is used by default. Any of these values can be overwritten by passing in flags via the JSON config or by passing in your own config file using key `config_file`. + +If `multiGPU` is set and `num_processes` is not explicitly set, the number of processes/GPUs will be determined by the number of GPUs available via `torch.cuda.device_count()`. + +If `multiGPU` is not set, the script will assume single-GPU and run with `num_processes=1`. The number of GPUs used can also be set by setting environment variable `CUDA_VISIBLE_DEVICES`. + + +## Building the Image + +With docker, build the image with: + +```sh +docker build . -t sft-trainer:mytag +``` + +## Running the Image + +Run sft-trainer-image with the JSON env var and mounts set up. + +```sh +docker run -v config.json:/app/config.json -v $MODEL_PATH:/model -v $TRAINING_DATA_PATH:/data/twitter_complaints.json --env SFT_TRAINER_CONFIG_JSON_PATH=/app/config.json sft-trainer:mytag +``` + +This will run `accelerate_launch.py` with the JSON config passed. + +An example Kubernetes Pod for deploying sft-trainer which requires creating PVCs with the model and input dataset and any mounts needed for the outputted tuned model: + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: +name: sft-trainer-config +data: +config.json: | + { + "multiGPU": { + "num_machines": 1, + "main_process_port": 1234, + "num_processes": 2, + "use_fsdp": true, + "fsdp_backward_prefetch_policy": "TRANSFORMER_BASED_WRAP", + "fsdp_sharding_strategy": 1, + "fsdp_state_dict_type": "FULL_STATE_DICT", + "fsdp_cpu_ram_efficient_loading": true, + "fsdp_sync_module_states": true + }, + "model_name_or_path": "/llama/7B", + "training_data_path": "/data/twitter_complaints.json", + "output_dir": "/output/llama-7b-pt-multigpu", + "num_train_epochs": 5.0, + "per_device_train_batch_size": 4, + "per_device_eval_batch_size": 4, + "gradient_accumulation_steps": 4, + "save_strategy": "epoch", + "learning_rate": 0.03, + "weight_decay": 0.0, + "lr_scheduler_type": "cosine", + "logging_steps": 1.0, + "packing": false, + "include_tokens_per_second": true, + "response_template": "\n### Label:", + "dataset_text_field": "output", + "use_flash_attn": false, + "torch_dtype": "bfloat16", + "tokenizer_name_or_path": "/llama/7B" + } +--- +apiVersion: v1 +kind: Pod +metadata: +name: sft-trainer-test +spec: +containers: + env: + - name: SFT_TRAINER_CONFIG_JSON_PATH + value: /config/config.json + image: sft-trainer:mytag + imagePullPolicy: IfNotPresent + name: tuning-test + resources: + limits: + nvidia.com/gpu: "1" + requests: + nvidia.com/gpu: "1" + volumeMounts: + - mountPath: /data/input + name: input-data + - mountPath: /data/output + name: output-data + - mountPath: /config + name: sft-trainer-config +restartPolicy: Never +terminationGracePeriodSeconds: 30 +volumes: + - name: input-data + persistentVolumeClaim: + claimName: input-pvc + - name: output-data + persistentVolumeClaim: + claimName: output-pvc + - name: sft-trainer-config + configMap: + name: sft-trainer-config +```