v2.5
What's Changed
- 🔥[InstInfer] InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference by @DefTruth in #65
- Update codebase of paper "parallel speculative decoding with adaptive draft length" by @smart-lty in #66
- move RetrievalAttention -> long context by @DefTruth in #67
- 🔥🔥[CRITIPREFILL] CRITIPREFILL: A SEGMENT-WISE CRITICALITYBASED APPROACH FOR PREFILLING ACCELERATION IN LLMS by @DefTruth in #68
- Bump up to v2.5 by @DefTruth in #69
New Contributors
- @smart-lty made their first contribution in #66
Full Changelog: v2.4...v2.5