Release v2.1 · DefTruth/Awesome-LLM-Inference

What's Changed

Update README.md by @DefTruth in #40
🔥[Speculative Decoding] Parallel Speculative Decoding with Adaptive Draft Length by @DefTruth in #41
🔥[FocusLLM] FocusLLM: Scaling LLM’s Context by Parallel Decoding by @DefTruth in #42
🔥[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput by @DefTruth in #43
🔥[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding by @DefTruth in #44
Add ABQ-LLM code link by @DefTruth in #46
🔥🔥[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models by @DefTruth in #47
🔥[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs by @DefTruth in #48
🔥🔥[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementa… by @DefTruth in #49
Bump up to v2.1 by @DefTruth in #50

Full Changelog: v2.0...v2.1