-
We use the data obtained from ManyThings.org which collected this data from tatoeba.org corpus.
-
The data is licensed under the Creative Commons - Attribution 2.0 France license. (terms of use page on tatoeba.org)
- Though the source data is available under the CC BY license, the downloaded datafile on this repository have been edited by ManyThings.org and are copyrighted.
You can find tokenized files produced using original data under this folder *.tsv
. We use these files in our experiments.