Loading Lexeme by ID doesn't work #37

soshial · 2018-07-04T21:17:13Z

I tried to use function calls, as you mentioned here:

    Analyzer analyzer = new Analyzer(true);
    Lexeme lexeme = analyzer.lexemeByID(43716);
    List<Wordform> wordforms = analyzer.generateInflections(lexeme);

The returned lexeme is null. What did I do wrong? Thank you, @PeterisP !

The text was updated successfully, but these errors were encountered:

PeterisP · 2018-07-04T21:43:31Z

From 2.x we're switching to a lexicon automatically derived from tēzaurs.lv data export - the current default lexicon setup (Lexicon_v2.xml) does not have a lexeme with ID 43716.

Explicitly loading the old lexicon (new Analyzer("Lexicon.xml";) might work; or using lexeme IDs from tezaurs_lexemes.json.

soshial · 2018-07-04T22:43:14Z

Thanks for the explanation. I took the lexeme ID from here (this is probably lexicon v2) and tried with both Lexicon.xml and Lexicon_v2.xml. Is it the right workflow to get Lexeme by its ID?

soshial · 2018-07-07T12:49:14Z

When will this switching to v2 happen? Is web functionality of the same version & database, that this morphology library has at the moment?

For example, your /analyze/ API and Java analyzer.analyze(...) provide different results. The web API doesn't have some simple words like "kino", "atelje", "spēlētājs".

Also, will this library give information if a verb is of pabeigta/nepabeigta veida?

lauma · 2018-07-07T14:18:23Z

Pabeigtība in Latvian is not a verb feature to be easely derived from morphological elements like endings and infixes, so even if it were there (I doubt it), you should not trust it or use it.

soshial · 2018-07-07T21:36:54Z

I clearly understand that. But there is other information that cannot be derived from word form only:

Īpašvārds vs Sugas vārds
Transitivitāte
Īpašības vārda tips
Apstākļa vārda tips

but this information is in the lexicon though. That is why I am asking this question: does this information exist anywhere at all (some dictionaries)?

lauma · 2018-07-08T11:48:47Z

Some of the Tēzaurs sources contains transitivity, see "trans." or "intrans." near the head of verb entries. Also for transitivity there are more or less strightforward syntactical check - transitivity basically mean, if verb can be used together with object:

dzert alu, lasīt grāmatu - OK,
spēlēties ... - not OK,
gulēt ... miegu? ... - only rarely.

Meanwhile pabeigtība in Latvian is much more fuzzy and vague than transitivity, thus, I don't think it will end up in Tēzaurs. As some languages has morphological distinction for that, linguists of the world do speak of such feature, but for Latvian it is purely semantical.

Types of adverbs and adjectives, I think, mostly comes from grammar books, where it mostly goes like one type can be enumerated and all the rest goes in other type. As tagset used in korpuss.lv requires this feature, Tēzaurs probably will eventually become augumented with it.

When it comes to proper/common nouns: well, traditional dictionaries like LLVV or MLVV just do not include proper nouns. Tēzaurs sometimes contains markings "vietv." or "persv.", e.g., http://tezaurs.lv/#/sv/Liepa, but as it is with all the things in the Tēzaurs - coverage is partial. The same as with adjective/adverb type, some augumentation eventually will be done, but I don't know when or to what extent yet. For now quite telling is the usage of capital letters in the entryword - if it contains at least one capital letter anywere, it is either some abbrievation or proper noun, but not your average common noun.

soshial · 2018-07-09T12:51:29Z

Thank you very much. I didn't know that the information that I put in the list, wasn't fully added to tezaurs. Is it possible to add pabeigtība as a parameter, so that maybe I can try to add it to some verbs? I think I might have an idea how to do that not manually.

lauma · 2018-07-10T11:13:20Z

Umm, how do You plan to obtain such info? Usually even linguists strugle to assign pabeigtība unambiguously.

soshial · 2018-07-10T11:25:18Z

If we had Latvian-Russian dictionary electronically, then we might had been able to parse verb articles for presense of double verbs, for example:

Compare perf.-imp. "встречаться-встретиться" vs imp. "нравиться". We might mine this data and add it to tezaurs. Without doubt, this is raw and preliminary data that needs linguists' approval, but this might be a good start, isn't it?

lauma · 2018-07-10T12:04:29Z

I'm not convinced:

translation never happens one word <-> one word, it is always about matching some subset of the whole possible meanings each word in each language can have, and if some word in Russian has morphologically marked perfect, it does not mean all the meanings of "corresponding" Latvian verb will feature finished action semmantically.
various prefixes and verb/participle forms can impact perfect/imperfect,
generally perfect/imperfect is just a feature Latvian just do not have - the same way as Latvian has exactly two grammatical genders, while Russian has three, the same way as Latvian has some verb tenses Russian don't etc, thus, the applicability of such annotation is very limited.

soshial · 2019-10-06T13:13:12Z

Thank you for your explanation. I will try to accept the fact that this parameter is much more ephemeral in Latvian, than in Slavic languages. Nevajag to likt Prokrusta gultā =D

But in some cases, like "izdarīt" we can always say that it is used only as perfective, isnt it?

soshial · 2019-10-06T13:15:10Z

Returning to the original question I eventually got it to work (even with being weird that this nethod demands lemma explicitly, while lexeme must contain it):

List<Wordform> wordforms = analyzer.generateInflections(lexeme, lexeme.getValue("Pamatforma"));

PeterisP closed this as completed Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading Lexeme by ID doesn't work #37

Loading Lexeme by ID doesn't work #37

soshial commented Jul 4, 2018

PeterisP commented Jul 4, 2018 •

edited

Loading

soshial commented Jul 4, 2018 •

edited

Loading

soshial commented Jul 7, 2018 •

edited

Loading

lauma commented Jul 7, 2018

soshial commented Jul 7, 2018 •

edited

Loading

lauma commented Jul 8, 2018

soshial commented Jul 9, 2018

lauma commented Jul 10, 2018

soshial commented Jul 10, 2018 •

edited

Loading

lauma commented Jul 10, 2018

soshial commented Oct 6, 2019

soshial commented Oct 6, 2019

Loading Lexeme by ID doesn't work #37

Loading Lexeme by ID doesn't work #37

Comments

soshial commented Jul 4, 2018

PeterisP commented Jul 4, 2018 • edited Loading

soshial commented Jul 4, 2018 • edited Loading

soshial commented Jul 7, 2018 • edited Loading

lauma commented Jul 7, 2018

soshial commented Jul 7, 2018 • edited Loading

lauma commented Jul 8, 2018

soshial commented Jul 9, 2018

lauma commented Jul 10, 2018

soshial commented Jul 10, 2018 • edited Loading

lauma commented Jul 10, 2018

soshial commented Oct 6, 2019

soshial commented Oct 6, 2019

PeterisP commented Jul 4, 2018 •

edited

Loading

soshial commented Jul 4, 2018 •

edited

Loading

soshial commented Jul 7, 2018 •

edited

Loading

soshial commented Jul 7, 2018 •

edited

Loading

soshial commented Jul 10, 2018 •

edited

Loading