internlm-sft 训练loss一直为0 #178

C-myu · 2024-06-22T15:36:51Z

CUDA_VISIBLE_DEVICES=0,1，2，3 train_sft.py
--deepspeed ds_zero2_no_offload.json
--model_name_or_path internlm-7b
--use_lora true
--use_deepspeed true
--data_path hz_sft_data_test
--bf16 true
--fp16 false
--output_dir output_refuse_test
--num_train_epochs 5
--per_device_train_batch_size 3
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "epoch"
--save_total_limit 3
--learning_rate 4e-4
--logging_steps 10
--tf32 False
--model_max_length 2048 之后发现训练的loss一直是0，是由于没采用deepspeed的原因吗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internlm-sft 训练loss一直为0 #178

internlm-sft 训练loss一直为0 #178

C-myu commented Jun 22, 2024

internlm-sft 训练loss一直为0 #178

internlm-sft 训练loss一直为0 #178

Comments

C-myu commented Jun 22, 2024