We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --quantization_bit 8 ... 在V100机器上进行4卡训练,加上--quantization_bit 8避免oom,训练一个epoch后,得到的模型进行推理,推理效果非常差。另外通过web_demo2.py启动web服务,经常回答输出一点就停了,观测推理进程是正常的。
tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练,加上--quantization_bit 8避免oom,训练一个epoch后,得到的模型进行推理,推理效果非常差。另外通过web_demo2.py启动web服务,经常回答输出一点就停了,观测推理进程是正常的。
tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)
The text was updated successfully, but these errors were encountered: