max-batch-size对显存占用的影响 #2479
Unanswered
idontlikelongname
asked this question in
Q&A
Replies: 1 comment 6 replies
-
不是线性增长的关系,和 batch size 无关。 |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
使用W4A16量化后的Meta-Llama-3.1-70B-Instruct,在H800-80G测试模型吞吐量,使用脚本benchmark/profile_throughput.py。concurrency分别等于16和128时,GPU显存占用并没有明显的区别。
请问为什么会产生这种现象,我理解显存占用量与bath size的大小近似线性增长,从实际表现上看好像并不是这样。
请问Autoregression类模型的显存占用和batch size之间是什么关系?
Beta Was this translation helpful? Give feedback.
All reactions