Skip to content

v2.6.2

Compare
Choose a tag to compare
@DefTruth DefTruth released this 28 Oct 02:38
· 8 commits to main since this release
613300d

What's Changed

  • early exit of LLM inference by @boyi-liu in #85
  • Add paper AdaKV by @FFY0 in #86
  • Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance by @aharshms in #87
  • 🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference by @DefTruth in #88

New Contributors

Full Changelog: v2.6.1...v2.6.2