v2.5

DefTruth released this 26 Sep 03:25

· 27 commits to main since this release

What's Changed

🔥[InstInfer] InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference by @DefTruth in #65
Update codebase of paper "parallel speculative decoding with adaptive draft length" by @smart-lty in #66
move RetrievalAttention -> long context by @DefTruth in #67
🔥🔥[CRITIPREFILL] CRITIPREFILL: A SEGMENT-WISE CRITICALITYBASED APPROACH FOR PREFILLING ACCELERATION IN LLMS by @DefTruth in #68
Bump up to v2.5 by @DefTruth in #69

New Contributors

@smart-lty made their first contribution in #66

Full Changelog: v2.4...v2.5

Contributors

DefTruth and smart-lty

Assets 2