-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs on certain words #2
Comments
First conjugation in Latvian is always hard and to make it harder "aizaust" has two homonyms with different form system. You can get the right the correct form tables using http://api.tezaurs.lv/v1/inflections/aizaust?paradigm=15&stem1=aizaus&stem2=aizaust&stem3=aizaus and http://api.tezaurs.lv/v1/inflections/aizaust?paradigm=15&stem1=aizaus&stem2=aizau%C5%BE&stem3=aizaud However, I do aggree that the current result of http://api.tezaurs.lv/v1/inflections/aizaust is misleading and should be refined :) In general, inflection service tries to guess what kind of word (part of speach, etc.) have you provided and sometimes gueses wrong. To avoid such situations you should use paradigm given in column 4 and stems given in columns 5-7 in the wordlist files. I will update readme with an example to make it more clear. |
In my opinion, it would be quite easy to tinker the algorithm and give out all possible variants of conjugation. It looks in the dictionary, sees two possible declensions, takes their stems and gives the correct answer. Stating that the system is misleading is quite a mild term, considering that it gives such results for words that are already in the system: for example, steidzošs is recognized as a noun. o_O |
Inflection service is not built for Tēzaurs.lv specifically, and it currently uses smaller lexicon than the whole Tēzaurs. This is done because (1) Tezaurs contains lots of rare words and guessing them all when doing text morphological tagging would decrease overall accuracy because of homoformy with more common words (2) we were short of time and haven't managed to make specific Tēzaurs-based inflection service variant yet. |
The http://api.tezaurs.lv/v1/inflections/steidzo%C5%A1s API assumes that "the caller knows best" and provides a valid lemma; this is not a recognition API; it's intended to provide inflection of whatever the caller provides. For recognition it would be more appropriate to use something like http://api.tezaurs.lv:8182/analyze/json/aizskaro%C5%A1s or the appropriate function calls from https://github.com/peterisp/morphology Java module. |
The text was updated successfully, but these errors were encountered: