Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs on certain words #2

Open
soshial opened this issue Mar 18, 2017 · 5 comments
Open

Bugs on certain words #2

soshial opened this issue Mar 18, 2017 · 5 comments

Comments

@soshial
Copy link

soshial commented Mar 18, 2017

  1. aizaust has no present tense and mistakes in other forms
  2. steidzošs is recognized as a noun.
@lauma
Copy link
Member

lauma commented Mar 18, 2017

First conjugation in Latvian is always hard and to make it harder "aizaust" has two homonyms with different form system. You can get the right the correct form tables using http://api.tezaurs.lv/v1/inflections/aizaust?paradigm=15&stem1=aizaus&stem2=aizaust&stem3=aizaus and http://api.tezaurs.lv/v1/inflections/aizaust?paradigm=15&stem1=aizaus&stem2=aizau%C5%BE&stem3=aizaud

However, I do aggree that the current result of http://api.tezaurs.lv/v1/inflections/aizaust is misleading and should be refined :)

In general, inflection service tries to guess what kind of word (part of speach, etc.) have you provided and sometimes gueses wrong. To avoid such situations you should use paradigm given in column 4 and stems given in columns 5-7 in the wordlist files. I will update readme with an example to make it more clear.

@soshial
Copy link
Author

soshial commented Mar 19, 2017

In my opinion, it would be quite easy to tinker the algorithm and give out all possible variants of conjugation. It looks in the dictionary, sees two possible declensions, takes their stems and gives the correct answer.

Stating that the system is misleading is quite a mild term, considering that it gives such results for words that are already in the system: for example, steidzošs is recognized as a noun. o_O

@lauma
Copy link
Member

lauma commented Mar 19, 2017

Inflection service is not built for Tēzaurs.lv specifically, and it currently uses smaller lexicon than the whole Tēzaurs. This is done because (1) Tezaurs contains lots of rare words and guessing them all when doing text morphological tagging would decrease overall accuracy because of homoformy with more common words (2) we were short of time and haven't managed to make specific Tēzaurs-based inflection service variant yet.

@PeterisP
Copy link
Member

The http://api.tezaurs.lv/v1/inflections/steidzo%C5%A1s API assumes that "the caller knows best" and provides a valid lemma; this is not a recognition API; it's intended to provide inflection of whatever the caller provides.

For recognition it would be more appropriate to use something like http://api.tezaurs.lv:8182/analyze/json/aizskaro%C5%A1s or the appropriate function calls from https://github.com/peterisp/morphology Java module.

@soshial
Copy link
Author

soshial commented Mar 25, 2017

Thank you so much, @PeterisP, for providing this analyze endpoint.

@lauma, is it possible to get some result from inflections API if I have lexeme number?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants