VTI-Data

NLU and NLG datasets developed within the Latvian Language Technology Initiative

Alpaca Latvian dataset

ALPACA-LV is a machine translated Alpaca instruction dataset for Latvian.
COPA

COPA is a machine translated COPA benchmark dataset for Latvian.
MMLU

MMLU is a machine translated MMLU benchmark dataset for Latvian. The sociology_postedited.json file contains a post-edited collection of the first 100 tasks in the sociology subject.
LV-exams

Multiple-choice questions (MCQ) from Latvian Centralized High School Exams.

Provide feedback