v2.6.2

DefTruth released this 28 Oct 02:38

· 8 commits to main since this release

What's Changed

early exit of LLM inference by @boyi-liu in #85
Add paper AdaKV by @FFY0 in #86
Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance by @aharshms in #87
🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference by @DefTruth in #88

New Contributors

@boyi-liu made their first contribution in #85
@FFY0 made their first contribution in #86
@aharshms made their first contribution in #87

Full Changelog: v2.6.1...v2.6.2

Contributors

DefTruth, aharshms, and 2 other contributors

Assets 2