Centralized repo for Finnish instruction data. License?
instruction
: mandatory. May include all the required context.
input
: optional. Used as an optional context for instructions.
output
: Generated response.
Examples:
instruction: "Laadi kysymys annetusta tekstistä."
input: "Teksti: Sandra siirtyi toimistoon. Sandra meni kylpyhuoneeseen. Mary meni makuuhuoneeseen. Daniel siirtyi eteiseen."
output: "Missä on Daniel?"
or
instruction: "Kumpi on parempaa suomenkieltä, vaihtoehto a) {a} vai b) {b}"
input: ""
output: "a"
TODOs
Ready / contains already usable material
- Machine translated from the original English using DeepL
- dolly-fi: Finnish version of the databricks-dolly-15k instruction dataset
- oasst-fi Finnish version of OpenAssistant dataset v1
- natural-instructions-fi is an on-going process of machine translating manually selected tasks.
- Lima-fi
- Synthetic datasets
TODOs
- OCR-correction
- Masked word prediction
- Recognize text from space-separated characters
- Recognize text from characters without space
Ready
- Native Finnish datasets:
Ready
- Paraphrase
- Question Answering