Skip to content

why launch 3 kernels for prefill stage? #2501

Answered by lzhangzz
sleepwalker2017 asked this question in Q&A
Discussion options

You must be logged in to vote
  1. 基本都对
  2. 这些操作不适合跟 attention,attention 的内层循环非常 dense,fuse 其它操作容易变慢

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@sleepwalker2017
Comment options

Answer selected by sleepwalker2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #2500 on September 23, 2024 10:49.