SAGE ATTENTION: 2.1 TIMES FASTER THAN FLASH ATTENTION 2 AND 2.7 TIMES FASTER THAN XFORMERS #9901
joseph777111
started this conversation in
Ideas
Replies: 2 comments 2 replies
-
@ggerganov @ikawrakow @kalomaze @slaren @JohannesGaessler @calvintwr @LostRuins @bartowski1182 Thoughts? 😋 |
Beta Was this translation helpful? Give feedback.
2 replies
-
This should be interesting as it can perform attention int8 so it does support most hardware |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
https://arxiv.org/abs/2410.02367
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
ABSTRACT:
Beta Was this translation helpful? Give feedback.
All reactions