基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？ #22

ivankxt · 2023-11-02T09:41:58Z

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练，加上--quantization_bit 8避免oom，训练一个epoch后，得到的模型进行推理，推理效果非常差。另外通过web_demo2.py启动web服务，经常回答输出一点就停了，观测推理进程是正常的。

tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？ #22

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？ #22

ivankxt commented Nov 2, 2023

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？ #22

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？ #22

Comments

ivankxt commented Nov 2, 2023