Releases: DefTruth/Awesome-LLM-Inference
Releases · DefTruth/Awesome-LLM-Inference
v2.0
What's Changed
- 🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in #33
- 🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in #34
- KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in #35
- Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in #36
- 🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in #37
- [Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in #38
- Bump up to v2.0 by @DefTruth in #39
Full Changelog: v1.9...v2.0
v1.9
What's Changed
- 🔥[DynamoLLM] DynamoLLM: Designing LLM Inference Clusters for Performa… by @DefTruth in #28
- 🔥[Zero-Delay QKV Compression] Zero-Delay QKV Compression for Mitigati… by @DefTruth in #29
- 🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Servin… by @DefTruth in #30
- 🔥🔥[500xCompressor] 500xCompressor: Generalized Prompt Compression for… by @DefTruth in #31
- Bump up to v1.9 by @DefTruth in #32
Full Changelog: v1.8...v1.9
v1.8
What's Changed
- 🔥[flashinfer] FlashInfer: Kernel Library for LLM Serving(@flashinfer-ai) by @DefTruth in #24
- 🔥[Palu] Palu: Compressing KV-Cache with Low-Rank Projection(@nycu.edu… by @DefTruth in #25
- 🔥[SentenceVAE] SentenceVAE: Faster, Longer and More Accurate Inferenc… by @DefTruth in #26
- Bump up to v1.8 by @DefTruth in #27
Full Changelog: v1.7...v1.8
v1.7
What's Changed
- Add paper "Internal Consistency and Self-Feedback in Large Language Models: A Survey" by @fan2goa1 in #21
- Update README.md by @clevercool in #22
New Contributors
- @fan2goa1 made their first contribution in #21
- @clevercool made their first contribution in #22
Full Changelog: v1.6...v1.7
v1.6
Full Changelog: v1.5...v1.6
v1.5
v1.3
What's Changed
- [MoA] MoA: Mixture of Sparse Attention for Automatic LLM Compression by @liyucheng09 in #19
Full Changelog: v1.2...v1.3
v1.2
v1.1
Full Changelog: v1.0...v1.1
v1.0
Full Changelog: v0.9...v1.0