Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 828 Bytes

README.md

File metadata and controls

19 lines (11 loc) · 828 Bytes

VTI-Data

NLU and NLG datasets developed within the Latvian Language Technology Initiative

  1. Alpaca Latvian dataset

    ALPACA-LV is a machine translated Alpaca instruction dataset for Latvian.

  2. COPA

    COPA is a machine translated COPA benchmark dataset for Latvian.

  3. MMLU

    MMLU is a machine translated MMLU benchmark dataset for Latvian. The sociology_postedited.json file contains a post-edited collection of the first 100 tasks in the sociology subject.

  4. LV-exams

    Multiple-choice questions (MCQ) from Latvian Centralized High School Exams.