-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding initial code drop for llm finetune #698
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
ab55445
adding initial code drop for llm finetune
itayhubara 6a6ca47
(a) fixing padding issue; (b) masking input tokens for eval dataset; …
itayhubara 87992ca
fix masking bug
itayhubara 11e47c4
adding more logger support
itayhubara 8f791c7
bug fix
itayhubara efd899b
fix logging bug and update HP
itayhubara 8a9668f
adding patch for memmory issue and fused model enablement
itayhubara 2165163
fixing dataset and model links and updating bash script and readme
itayhubara efdcd18
Fix eval batch size, add Dockerfile, improve logging, remove unused code
michal2409 7491573
Fix eval batch size, add Dockerfile, improve logging, remove unused code
michal2409 4074852
Remove training_step
michal2409 a102e34
Merge pull request #1 from michal2409/llama_v2_finetuning
itayhubara ac0eb0d
renaming directory and adding more HP values to logger
itayhubara aa8415d
adding weight decay to TrainingArguments and BLOCK_START BLOCK_STOP
itayhubara a8efc51
editing logging to resolve all checker issues
itayhubara 552c046
fix issue in steps_num logging
itayhubara 5970ae9
updating bash script for GBS=8
itayhubara File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:24.01-py3 | ||
FROM ${FROM_IMAGE_NAME} | ||
|
||
WORKDIR /workspace/ft-llm | ||
ADD . /workspace/ft-llm | ||
|
||
RUN pip install -r requirements.txt | ||
RUN pip install flash-attn==2.4.1 --no-build-isolation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# LoRA benchmark | ||
|
||
LoRA benchmark on GPU (Nvidia A100 80GB). Inspired by [this blog post](https://medium.com/@sourabmangrulkar/falcon-180b-finetuning-using-peft-and-deepspeed-b92643091d99) and [this script](https://github.com/pacman100/DHS-LLM-Workshop/blob/main/chat_assistant/training/train.py). | ||
|
||
|
||
## Setup | ||
|
||
Run the following: | ||
```bash | ||
sudo ./run_docker.sh | ||
cd lora | ||
pip install -r requirements.txt | ||
``` | ||
|
||
> The Docker run command contains `-v /home/regis_huggingface_co/workspace:/root/workspace --workdir /root/workspace`. Feel free to change these flags at your own convenience. | ||
|
||
You will also need to run the following to install flash attention: | ||
``` | ||
pip install flash-attn --no-build-isolation | ||
``` | ||
|
||
> For flash attention, make sure that the following command returns 0: | ||
> ``` | ||
> ninja --version >/dev/null && echo $? | ||
> ``` | ||
> If not, run | ||
> ``` | ||
> pip uninstall -y ninja && pip install ninja | ||
> ``` | ||
> and install `flash-attn` again. | ||
> More information [here](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features). | ||
|
||
Make sure to have requested permission for donwloading Llama2 weights on the Hugging Face Hub: https://huggingface.co/meta-llama/Llama-2-7b-hf | ||
Then, you will need to be connected to your Hugging Face account with a read token running: | ||
``` | ||
huggingface-cli login | ||
``` | ||
Finally please install mlperf logger: | ||
``` | ||
git clone https://github.com/mlperf/logging.git mlperf-logging | ||
pip install -e mlperf-logging | ||
``` | ||
## Download Data and Model | ||
data can be downloaded from: | ||
[mlperf drive - train data](https://drive.google.com/file/d/1-JgY1mEafcJ7qhggt6UR3OEKAciIPd5s/view?usp=sharing) | ||
[mlperf drive - validation data](https://drive.google.com/file/d/1jrm6Lacrq49AYv0uB_Qy22xRmfPixQvs/view?usp=sharing) | ||
[mlperf drive - llama-v2 model](https://drive.google.com/drive/folders/1sTeuxkPhwkNPKIPFnOLIYCcK53oB3Ypc?usp=sharing) | ||
As defaults the scripts assume the model is under at ```./llama-v2-fused-qkv``` and the both train and validation are under ```dataset``` folder. | ||
|
||
## Llama2-70B on 8 devices | ||
|
||
Run: | ||
```bash | ||
accelerate launch --config_file configs/default_config.yaml scripts/train.py \ | ||
--model_name meta-llama/Llama-2-70b-hf \ | ||
--dataset_name "tau/scrolls" --dataset_config_name "gov_report" \ | ||
--max_seq_len 8192 \ | ||
--bf16 True \ | ||
--logging_steps 1 \ | ||
--eval_steps 22 \ | ||
--output_dir "/tmp/llama-70b" \ | ||
--per_device_train_batch_size 1 \ | ||
--gradient_accumulation_steps 1 \ | ||
--dataset_text_field "input" \ | ||
--lr_scheduler_type "cosine" \ | ||
--learning_rate 1e-3 \ | ||
--warmup_ratio 0.03 \ | ||
--use_gradient_checkpointing True \ | ||
--use_peft_lora True \ | ||
--lora_r 16 \ | ||
--lora_alpha 32 \ | ||
--lora_dropout 0.1 \ | ||
--max_steps 440 \ | ||
--use_flash_attn \ | ||
--lora_target_modules "q_proj,v_proj,k_proj,o_proj" | ||
``` | ||
where the Accelerate config file is [this one](https://github.com/regisss/lora/blob/main/configs/default_config.yaml). | ||
|
||
> Using flash attention with `--use_flash_attn` is necessary for training on 8k-token sequences. | ||
|
||
Learning curves of such a run can be found here: https://huggingface.co/regisss/test_5/tensorboard | ||
|
||
|
||
## Evaluation | ||
|
||
To run evaluation for summarizing texts, you can run: | ||
- Without LoRA adapter weights: | ||
``` | ||
python scripts/eval.py --model_name meta-llama/Llama-2-70b-hf --max_new_tokens 900 --seq_length 8192 --do_sample --dataset_name "tau/scrolls" --dataset_config_name "gov_report" | ||
``` | ||
- With LoRA adapter weights: | ||
``` | ||
python scripts/eval.py --peft_model_name path_to_my_lora_model --max_new_tokens 900 --seq_length 8192 --do_sample --dataset_name "tau/scrolls" --dataset_config_name "gov_report" | ||
``` | ||
## expected outcome | ||
|
||
A clean output (train and eval loss) of a singel run with 440 steps can be found under | ||
``` | ||
convergence_example.txt | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
compute_environment: LOCAL_MACHINE | ||
debug: false | ||
deepspeed_config: | ||
gradient_accumulation_steps: 1 | ||
offload_optimizer_device: none | ||
offload_param_device: none | ||
zero3_init_flag: true | ||
zero3_save_16bit_model: true | ||
zero_stage: 3 | ||
distributed_type: DEEPSPEED | ||
downcast_bf16: 'no' | ||
machine_rank: 0 | ||
main_training_function: main | ||
mixed_precision: bf16 | ||
num_machines: 1 | ||
num_processes: 8 | ||
rdzv_backend: static | ||
same_network: true | ||
tpu_env: [] | ||
tpu_use_cluster: false | ||
tpu_use_sudo: false | ||
use_cpu: false |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itayhubara could you please include a short description of what dataset is being used by the benchmark - training & eval and some text to capture size of it (e.g. number of samples or tokens, size in GB for storage)