why launch 3 kernels for prefill stage? #2501
Answered
by
lzhangzz
sleepwalker2017
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
sleepwalker2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
从代码看,
invokeProcessKV_v2_
这个 kernel 做了 Rope 和量化相关的事情。而
invokeFlattenKV_v2_
似乎仅仅是 Rope 相关的。dispatchAttention
是 Attention 相关的操作。疑惑:
谢谢!
补充:
第一个 kernel: 从连续内存加载 -> 预处理 -> 写入 page
第二个 kernel: 从 page 加载 -> 写入连续内存?
Beta Was this translation helpful? Give feedback.
All reactions