You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks so much for providing such resource so that we can all leverage the latest development of AI on different platforms.
I was able to use your example to run a job to fine tune Llama-2 7B on an old server with 8 NVIDIA 1080 GPU, which is estimated to take 33 hours to finish. However, I noticed that not all the GPUs are fully utilized as you can see the NVTOP screenshot below. Is there any configuration I can use to speed up the work?
Hello,
Thanks so much for providing such resource so that we can all leverage the latest development of AI on different platforms.
I was able to use your example to run a job to fine tune Llama-2 7B on an old server with 8 NVIDIA 1080 GPU, which is estimated to take 33 hours to finish. However, I noticed that not all the GPUs are fully utilized as you can see the NVTOP screenshot below. Is there any configuration I can use to speed up the work?
python qlora/qlora.py --model_name_or_path llama-2-7b-HF/ --use_auth --output_dir llama-2-guanaco-7b --logging_steps 10 --save_strategy steps --data_seed 42 --save_steps 500 --save_total_limit 40 --evaluation_strategy steps --eval_dataset_size 1024 --max_eval_samples 1000 --per_device_eval_batch_size 1 --max_new_tokens 32 --dataloader_num_workers 1 --group_by_length --logging_strategy steps --remove_unused_columns False --do_train --do_eval --lora_r 64 --lora_alpha 16 --lora_modules all --double_quant --quant_type nf4 --fp16 --bits 4 --warmup_ratio 0.03 --lr_scheduler_type constant --gradient_checkpointing --dataset oasst1 --source_max_len 16 --target_max_len 512 --per_device_train_batch_size 1 --gradient_accumulation_steps 16 --max_steps 1875 --eval_steps 187 --learning_rate 0.0002 --adam_beta2 0.999 --max_grad_norm 0.3 --lora_dropout 0.1 --weight_decay 0.0 --seed 0 --max_memory_MB 10000
The text was updated successfully, but these errors were encountered: