Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetune problem #22

Open
tumanshu opened this issue Nov 6, 2023 · 8 comments
Open

finetune problem #22

tumanshu opened this issue Nov 6, 2023 · 8 comments

Comments

@tumanshu
Copy link

tumanshu commented Nov 6, 2023

Questions:
I used my own data to train, I found the lr always be 0:
image
train script:

export EXPERIMENT_NAME=instruct_BLIP_deepSpeed_t5xxl_unfreeze_Projection_LLM_QV_weight_without_instruct_qformer_fshi
export DATASET_NAME=flickr
export CUDA_VISIBLE_DEVICES=0,1,2,3,4
export MODEL_DIR=models/
model_name_or_path=/data2/tutu/model/MMICL-Instructblip-T5-xxl
processor_path=/data2/tutu/model/instructblip-flan-t5-xxl

bs=3 #3
eval_bs=4
lr=1e-4
dropout=0.1
epoch=10
seed=1111
do_train=True
do_test=False
do_valid=False
master_port=29504
model_type=instructblip
deepspeed --master_port $master_port run.py
--experiment_name ${EXPERIMENT_NAME}
--dataset_name ${DATASET_NAME}
--dataset_config_name None
--load_datatype json
--max_seq_length 512
--overwrite_cache True
--pad_to_max_length True
--train_file /data2/tutu/MIC/Data/fushi_data/train
--validation_file /data2/tutu/MIC/Data/fushi_data/test
--test_file /data2/tutu/MIC/Data/fushi_data/test
--do_train $do_train
--do_eval $do_valid
--do_predict $do_test
--per_device_train_batch_size ${bs}
--bf16
--model_type $model_type
--save_total_limit 3
--per_device_eval_batch_size ${eval_bs}
--gradient_accumulation_steps 6
--num_train_epochs ${epoch}
--output_dir checkpoints/${EXPERIMENT_NAME}
--overwrite_output_dir
--learning_rate ${lr}
--weight_decay 0.0005
--seed ${seed}
--warmup_ratio 0
--evaluation_strategy steps
--eval_steps 50
--remove_unused_columns False
--model_name_or_path $model_name_or_path
--use_fast_tokenizer True
--processor_path $processor_path
--model_type 'instructblip'
--model_revision main
--eval_type val
--generation_max_length 64
--done_preprocess True
--max_eval_samples 200
--max_predict_samples 200
--run_name ${EXPERIMENT_NAME}
--using_instruct_qformer False
--deepspeed config/deepspeed_config.json
--save_steps 50
--load_best_model_at_end False
--logging_steps 10
--plot_loss True
--lr_scheduler_type cosine \

--multiple_choice True

Best wishes.

@tumanshu tumanshu changed the title finetune promblem finetune problem Nov 6, 2023
@HaozheZhao
Copy link
Owner

While I haven't encountered this exact issue before, I have noted similar problems in other projects: issue#1, issue#2, issue#3.

It appears that this could be due to either the settings of the deepspeed optimizer or the deepspeed config causing the bf16 to overflow, or it might be related to the transformer and deepspeed version you are using.

Can you provide your specific deepspeed config settings, as well as inform me of the transformer and deepspeed version you're currently using?

Taking into consideration issue#1, setting the transformers to 4.28.0, and deepspeed to 0.8.3, might rectify the issue. Also, it might be beneficial to remove the fp16 and lr scheduler configurations from the deepspeed config, and switch the optimizer to adamw.

@tumanshu
Copy link
Author

tumanshu commented Nov 7, 2023

While I haven't encountered this exact issue before, I have noted similar problems in other projects: issue#1, issue#2, issue#3.

It appears that this could be due to either the settings of the deepspeed optimizer or the deepspeed config causing the bf16 to overflow, or it might be related to the transformer and deepspeed version you are using.

Can you provide your specific deepspeed config settings, as well as inform me of the transformer and deepspeed version you're currently using?

Taking into consideration issue#1, setting the transformers to 4.28.0, and deepspeed to 0.8.3, might rectify the issue. Also, it might be beneficial to remove the fp16 and lr scheduler configurations from the deepspeed config, and switch the optimizer to adamw.

deepspeed strips:
{
"bf16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": 1e-4,
        "betas": "auto",
        "eps": "auto",
        "weight_decay": 0.0005
    }
},

"scheduler": {
    "type": "WarmupLR", 
    "params": {
        "warmup_min_lr": 0, 
        "warmup_max_lr": 0.0001, 
        "warmup_num_steps": 0
    }
}, 

"zero_optimization": {
    "stage": 2,
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
      },
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 6e7,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 6e7,
    "contiguous_gradients": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto"

}
deepspeed version is 0.9.3

@HaozheZhao
Copy link
Owner

While I haven't encountered this exact issue before, I have noted similar problems in other projects: issue#1, issue#2, issue#3.
It appears that this could be due to either the settings of the deepspeed optimizer or the deepspeed config causing the bf16 to overflow, or it might be related to the transformer and deepspeed version you are using.
Can you provide your specific deepspeed config settings, as well as inform me of the transformer and deepspeed version you're currently using?
Taking into consideration issue#1, setting the transformers to 4.28.0, and deepspeed to 0.8.3, might rectify the issue. Also, it might be beneficial to remove the fp16 and lr scheduler configurations from the deepspeed config, and switch the optimizer to adamw.

deepspeed strips: { "bf16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 },

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": 1e-4,
        "betas": "auto",
        "eps": "auto",
        "weight_decay": 0.0005
    }
},

"scheduler": {
    "type": "WarmupLR", 
    "params": {
        "warmup_min_lr": 0, 
        "warmup_max_lr": 0.0001, 
        "warmup_num_steps": 0
    }
}, 

"zero_optimization": {
    "stage": 2,
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
      },
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 6e7,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 6e7,
    "contiguous_gradients": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto"

} deepspeed version is 0.9.3

Perhaps consider configuring the parameters in the optimizer and scheduler to "auto". As far as I know, the HuggingFace trainer sets those parameters based on your training_args, which might otherwise cause some complications. It that do not work, maybe try to change the version of deepspeed. I am also try to reporduce this promble in my own environment.

@tumanshu
Copy link
Author

tumanshu commented Nov 7, 2023

While I haven't encountered this exact issue before, I have noted similar problems in other projects: issue#1, issue#2, issue#3.
It appears that this could be due to either the settings of the deepspeed optimizer or the deepspeed config causing the bf16 to overflow, or it might be related to the transformer and deepspeed version you are using.
Can you provide your specific deepspeed config settings, as well as inform me of the transformer and deepspeed version you're currently using?
Taking into consideration issue#1, setting the transformers to 4.28.0, and deepspeed to 0.8.3, might rectify the issue. Also, it might be beneficial to remove the fp16 and lr scheduler configurations from the deepspeed config, and switch the optimizer to adamw.

deepspeed strips: { "bf16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 },

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": 1e-4,
        "betas": "auto",
        "eps": "auto",
        "weight_decay": 0.0005
    }
},

"scheduler": {
    "type": "WarmupLR", 
    "params": {
        "warmup_min_lr": 0, 
        "warmup_max_lr": 0.0001, 
        "warmup_num_steps": 0
    }
}, 

"zero_optimization": {
    "stage": 2,
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
      },
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 6e7,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 6e7,
    "contiguous_gradients": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto"

} deepspeed version is 0.9.3

Perhaps consider configuring the parameters in the optimizer and scheduler to "auto". As far as I know, the HuggingFace trainer sets those parameters based on your training_args, which might otherwise cause some complications. It that do not work, maybe try to change the version of deepspeed. I am also try to reporduce this promble in my own environment.

Thank you for your response.
In fact, when I removed the optimizer and scheduler from the deepspeed configuration, the LR it's not zero.
However, I noticed that the loss isn't decreasing on my data.
image

I'm not sure if it's because my task is too challenging. If it's convenient, could I share my data with you?

@HaozheZhao
Copy link
Owner

Absolutely, feel free to share your data with me. You can get in touch with me via email. I will do my best to respond promptly.

P.S. Regarding your problem, I'm curious to know the size of your dataset. Is it possible that the learning rate is declining, but due to data precision, it's not discernible? For example, the learning rate could be 0.000095.

@tumanshu
Copy link
Author

tumanshu commented Nov 7, 2023 via email

@HaozheZhao
Copy link
Owner

It appears that I am not receiving the email. Perhaps you could call me directly using the email in my profile.

@tumanshu
Copy link
Author

tumanshu commented Nov 8, 2023

It appears that I am not receiving the email. Perhaps you could call me directly using the email in my profile.

OK, I sent an email to [email protected].
Looking forward to your response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants