-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix kernel cache miss and add RDNA configs #246
base: develop
Are you sure you want to change the base?
Conversation
hyoon1
commented
Oct 25, 2024
- added Navi configurations (Related PR: add RDNA Config triton#640)
- resolved cache miss issue during flash attention calls by fixing max_seqlen_q/k to 0
@@ -795,8 +880,8 @@ def forward( | |||
HQ=nheads_q, | |||
HK=nheads_k, | |||
ACTUAL_BLOCK_DMODEL=head_size, | |||
MAX_SEQLENS_Q=max_seqlens_q, | |||
MAX_SEQLENS_K=max_seqlens_k, | |||
MAX_SEQLENS_Q=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason to zero seq lens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below attention fwd kernel is called when we run the model with vllm:
attn_fwd[grid]( |
However, MAX_SEQLENS_Q/K differs every step, and it occurs different key value and compilation for the triton kernel each step, which leads to the performance degradation.
https://github.com/triton-lang/triton/blob/cf34004b8a67d290a962da166f5aa2fc66751326/python/triton/runtime/jit.py#L620
https://github.com/triton-lang/triton/blob/cf34004b8a67d290a962da166f5aa2fc66751326/python/triton/runtime/jit.py#L660
Currently, VARLEN is always set, and MAX_SEQLENS_Q/K are not used in this case when you look at the kernel in vllm.
def attn_fwd( |
Therefore, we just set MAX_SEQLENS_Q/K as a fixed value when we call the kernel for a workaround.
@@ -207,103 +209,186 @@ def _attn_fwd_inner( | |||
return acc, l_i, m_i | |||
|
|||
|
|||
@triton.autotune( | |||
configs=[ | |||
def get_gfx_version(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like not used, right?
).arch in ('gfx940', 'gfx941', 'gfx942', 'gfx90a', 'gfx908') | ||
|
||
|
||
def is_rdna(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth to use:
Line 1620 in 8f3bf8b
def is_navi() -> bool: |
return triton.runtime.driver.active.get_current_target().backend == "hip" | ||
|
||
|
||
def is_cdna(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my knowledge AMD has two lines of HW for vllm: MI and Navi. So not navi
should work better for future generations of MIs
As per @gshtras we need to merge into develop branch instead of main for now. Please correct. |
return None | ||
|
||
|
||
def is_hip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this functionality is implemented in a cross-architecture fashion in the platform/rocm.py and its superclasses
- added Navi configurations (Related PR: ROCm/triton#640) - resolved cache miss issue during flash attention calls by fixing max_seqlen_q/k to 0
@maleksan85 @gshtras Secondly, our team is using the v0.6.2+rocm release, and I understand that functions like is_navi() are not supported in that version. Implementing them would require significant modifications. Therefore, maintaining backward compatibility is also a concern. Given these considerations, I would greatly appreciate your advice on how to proceed with the modifications. |
As for your last point, whatever changes will be made here will not have any effect on the previous tags, so v0.6.2+rocm will not get affected. |