-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetune issue #142
Comments
Hi~ The default is to run with eight GPUs. If you use two GPUs, you need to set --nproc_per_node to 2. |
The training commands looks like: |
Do you encounter this issue when using zero_stage3_config.json? |
@Yuliang-Liu Nice work! I run into finetune issue as follows
2 GPUs of NVIDIA A800 80G is employed during training. The finetune script looks like
CUDA_VISIBLE_DEVICES=$gpu torchrun
--nnodes=1
--node_rank=0
--master_addr=0.0.0.0
--nproc_per_node=${GPUS}
--master_port=${MASTER_PORT}
internvl/train/internvl_chat_finetune.py
--model_name_or_path "models/OpenGVLab/InternVL2-2B"
--conv_style "internlm2-chat"
--output_dir ${OUTPUT_DIR}
--meta_path "shell/data/train-finetune.json"
--overwrite_output_dir True
--force_image_size 448
--max_dynamic_patch 6
--down_sample_ratio 0.5
--drop_path_rate 0.0
--freeze_llm True
--freeze_mlp True
--freeze_backbone True
--use_llm_lora 16
--vision_select_layer -1
--dataloader_num_workers 4
--bf16 True
--num_train_epochs 1
--per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE}
--gradient_accumulation_steps ${GRADIENT_ACC}
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 200
--save_total_limit 1
--learning_rate 4e-6
--weight_decay 0.01
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--max_seq_length 4096
--do_train True
--grad_checkpoint True
--group_by_length True
--dynamic_image_size True
--use_thumbnail True
--ps_version 'v2'
--deepspeed "zero_stage1_config.json"
--report_to "tensorboard"
Could you pls give me some clues to fix it? Thanks a lot
The text was updated successfully, but these errors were encountered: