Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uneven distribution of GPU workload #262

Open
liatamax opened this issue Sep 6, 2023 · 1 comment
Open

uneven distribution of GPU workload #262

liatamax opened this issue Sep 6, 2023 · 1 comment

Comments

@liatamax
Copy link

liatamax commented Sep 6, 2023

Hello,

Thanks so much for providing such resource so that we can all leverage the latest development of AI on different platforms.

I was able to use your example to run a job to fine tune Llama-2 7B on an old server with 8 NVIDIA 1080 GPU, which is estimated to take 33 hours to finish. However, I noticed that not all the GPUs are fully utilized as you can see the NVTOP screenshot below. Is there any configuration I can use to speed up the work?

python qlora/qlora.py --model_name_or_path llama-2-7b-HF/ --use_auth --output_dir llama-2-guanaco-7b --logging_steps 10 --save_strategy steps --data_seed 42 --save_steps 500 --save_total_limit 40 --evaluation_strategy steps --eval_dataset_size 1024 --max_eval_samples 1000 --per_device_eval_batch_size 1 --max_new_tokens 32 --dataloader_num_workers 1 --group_by_length --logging_strategy steps --remove_unused_columns False --do_train --do_eval --lora_r 64 --lora_alpha 16 --lora_modules all --double_quant --quant_type nf4 --fp16 --bits 4 --warmup_ratio 0.03 --lr_scheduler_type constant --gradient_checkpointing --dataset oasst1 --source_max_len 16 --target_max_len 512 --per_device_train_batch_size 1 --gradient_accumulation_steps 16 --max_steps 1875 --eval_steps 187 --learning_rate 0.0002 --adam_beta2 0.999 --max_grad_norm 0.3 --lora_dropout 0.1 --weight_decay 0.0 --seed 0 --max_memory_MB 10000

image

@ichsan2895
Copy link

ichsan2895 commented Sep 18, 2023

Please check this post. Hopefully it solves the problem.
Multi-gpu training example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants