Skip to content

v2.1

Compare
Choose a tag to compare
@DefTruth DefTruth released this 28 Aug 01:53
· 47 commits to main since this release
74f887c

What's Changed

  • Update README.md by @DefTruth in #40
  • 🔥[Speculative Decoding] Parallel Speculative Decoding with Adaptive Draft Length by @DefTruth in #41
  • 🔥[FocusLLM] FocusLLM: Scaling LLM’s Context by Parallel Decoding by @DefTruth in #42
  • 🔥[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput by @DefTruth in #43
  • 🔥[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding by @DefTruth in #44
  • Add ABQ-LLM code link by @DefTruth in #46
  • 🔥🔥[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models by @DefTruth in #47
  • 🔥[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs by @DefTruth in #48
  • 🔥🔥[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementa… by @DefTruth in #49
  • Bump up to v2.1 by @DefTruth in #50

Full Changelog: v2.0...v2.1