Benchmarking RAG Embedding models for Malaysian context, HuggingFace space at https://huggingface.co/spaces/mesolitica/Malaysian-Embedding-Leaderboard
📈 We evaluate models based on 2 datasets,
- Research paper keyword
melayu
using Crossref, https://huggingface.co/datasets/mesolitica/malaysian-ultrachat/resolve/main/ultrachat-crossref-melayu-malay.jsonl - lom.agc.gov.my PDF files, https://huggingface.co/datasets/mesolitica/malaysian-ultrachat/resolve/main/ultrachat-lom-agc.jsonl