v2.6.2
What's Changed
- early exit of LLM inference by @boyi-liu in #85
- Add paper AdaKV by @FFY0 in #86
- Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance by @aharshms in #87
- 🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference by @DefTruth in #88
New Contributors
- @boyi-liu made their first contribution in #85
- @FFY0 made their first contribution in #86
- @aharshms made their first contribution in #87
Full Changelog: v2.6.1...v2.6.2