Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差,是什么原因? #22

Open
ivankxt opened this issue Nov 2, 2023 · 0 comments

Comments

@ivankxt
Copy link

ivankxt commented Nov 2, 2023

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练,加上--quantization_bit 8避免oom,训练一个epoch后,得到的模型进行推理,推理效果非常差。另外通过web_demo2.py启动web服务,经常回答输出一点就停了,观测推理进程是正常的。

tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant