- The paper discusses the popularity of extending the context window of large language models (LLMs).
- It explores the comparison between retrieval-augmentation and long context window extension for LLMs.
- The study uses two pretrained LLMs, the proprietary 43B GPT and LLaMA2-70B, for experimentation.
- Surprisingly, the research finds that a 4K context window LLM with retrieval-augmentation can achieve comparable performance to a finetuned LLM with a 16K context window, while being computationally more efficient.
- Retrieval is shown to significantly enhance the performance of LLMs, regardless of their extended context window sizes.
- The best-performing model in the study is a retrieval-augmented LLaMA2-70B with a 32K context window, outperforming other models in various long context tasks, such as question answering and query-based summarization.
- This retrieval-augmented model also surpasses its non-retrieval baseline while being faster at generation.
- The study offers valuable insights for practitioners on choosing between retrieval-augmentation and long context extension for LLMs.