Skip to content

Releases: DefTruth/Awesome-LLM-Inference

v2.0

19 Aug 01:22
8c0b51d
Compare
Choose a tag to compare

What's Changed

  • 🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in #33
  • 🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in #34
  • KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in #35
  • Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in #36
  • 🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in #37
  • [Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in #38
  • Bump up to v2.0 by @DefTruth in #39

Full Changelog: v1.9...v2.0

v1.9

12 Aug 01:27
e6b8cf4
Compare
Choose a tag to compare

What's Changed

  • 🔥[DynamoLLM] DynamoLLM: Designing LLM Inference Clusters for Performa… by @DefTruth in #28
  • 🔥[Zero-Delay QKV Compression] Zero-Delay QKV Compression for Mitigati… by @DefTruth in #29
  • 🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Servin… by @DefTruth in #30
  • 🔥🔥[500xCompressor] 500xCompressor: Generalized Prompt Compression for… by @DefTruth in #31
  • Bump up to v1.9 by @DefTruth in #32

Full Changelog: v1.8...v1.9

v1.8

05 Aug 02:33
6bb8818
Compare
Choose a tag to compare

What's Changed

  • 🔥[flashinfer] FlashInfer: Kernel Library for LLM Serving(@flashinfer-ai) by @DefTruth in #24
  • 🔥[Palu] Palu: Compressing KV-Cache with Low-Rank Projection(@nycu.edu… by @DefTruth in #25
  • 🔥[SentenceVAE] SentenceVAE: Faster, Longer and More Accurate Inferenc… by @DefTruth in #26
  • Bump up to v1.8 by @DefTruth in #27

Full Changelog: v1.7...v1.8

v1.7

29 Jul 00:46
a6c1528
Compare
Choose a tag to compare

What's Changed

  • Add paper "Internal Consistency and Self-Feedback in Large Language Models: A Survey" by @fan2goa1 in #21
  • Update README.md by @clevercool in #22

New Contributors

Full Changelog: v1.6...v1.7

v1.6

23 Jul 01:09
a186334
Compare
Choose a tag to compare

Full Changelog: v1.5...v1.6

v1.5

15 Jul 01:23
7e9c309
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.3...v1.5

v1.3

08 Jul 01:38
353867a
Compare
Choose a tag to compare

What's Changed

  • [MoA] MoA: Mixture of Sparse Attention for Automatic LLM Compression by @liyucheng09 in #19

Full Changelog: v1.2...v1.3

v1.2

20 Jun 01:12
33833fb
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1...v1.2

v1.1

12 Jun 01:11
edfe64e
Compare
Choose a tag to compare

Full Changelog: v1.0...v1.1

v1.0

01 Jun 09:24
2962605
Compare
Choose a tag to compare

Full Changelog: v0.9...v1.0