Skip to content

v2.5

Compare
Choose a tag to compare
@DefTruth DefTruth released this 26 Sep 03:25
· 27 commits to main since this release
3e43647

What's Changed

  • 🔥[InstInfer] InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference by @DefTruth in #65
  • Update codebase of paper "parallel speculative decoding with adaptive draft length" by @smart-lty in #66
  • move RetrievalAttention -> long context by @DefTruth in #67
  • 🔥🔥[CRITIPREFILL] CRITIPREFILL: A SEGMENT-WISE CRITICALITYBASED APPROACH FOR PREFILLING ACCELERATION IN LLMS by @DefTruth in #68
  • Bump up to v2.5 by @DefTruth in #69

New Contributors

Full Changelog: v2.4...v2.5