- [01.30] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
- [02.19] FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
- [06.20] CodeRAG-Bench: Can Retrieval Augment Code Generation?
- [10.30] Long2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall