A neural sequence-to-sequence model for lemmatization using OpenNMT and pytorch libraries.
The Universal Lemmatizer is part of the Turku-neural-parser-pipeline (https://turkunlp.github.io/Turku-neural-parser-pipeline/) with pre-trained models for more than 50 languages, and state-of-the-art lemmatization results in the CoNLL-18 Shared Task on Parsing Universal Dependencies. See TurkuNLP entry at http://universaldependencies.org/conll18/results-lemmas.html.
See Turku-neural-parser-pipeline documentation at https://turkunlp.github.io/Turku-neural-parser-pipeline/.
See documentation at https://turkunlp.org/Turku-neural-parser-pipeline/training.
@article{kanerva2020lemmatizer,
title={Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks},
author={Kanerva, Jenna and Ginter, Filip and Salakoski, Tapio},
year={2020},
journal={Natural Language Engineering},
publisher={Cambridge University Press},
DOI={10.1017/S1351324920000224},
pages={1--30},
url={http://dx.doi.org/10.1017/S1351324920000224}
}