Skip to content

jmnybl/universal-lemmatizer

Repository files navigation

universal-lemmatizer

A neural sequence-to-sequence model for lemmatization using OpenNMT and pytorch libraries.

The Universal Lemmatizer is part of the Turku-neural-parser-pipeline (https://turkunlp.github.io/Turku-neural-parser-pipeline/) with pre-trained models for more than 50 languages, and state-of-the-art lemmatization results in the CoNLL-18 Shared Task on Parsing Universal Dependencies. See TurkuNLP entry at http://universaldependencies.org/conll18/results-lemmas.html.

Running the lemmatizer with pre-trained models

See Turku-neural-parser-pipeline documentation at https://turkunlp.github.io/Turku-neural-parser-pipeline/.

Training new models

See documentation at https://turkunlp.org/Turku-neural-parser-pipeline/training.

Reference:

@article{kanerva2020lemmatizer,
title={Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks},
author={Kanerva, Jenna and Ginter, Filip and Salakoski, Tapio},
year={2020},
journal={Natural Language Engineering},
publisher={Cambridge University Press},
DOI={10.1017/S1351324920000224},
pages={1--30},
url={http://dx.doi.org/10.1017/S1351324920000224}
}

About

Instructions how to download and run ready made models for more than 50 languages: https://turkunlp.github.io/Turku-neural-parser-pipeline/

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •