Validate MiniLcm types #1344

rmunn · 2025-01-06T21:22:57Z

Fixes #1275.
Fixes #1276.
Fixes #1277.
Fixes #1278.
Fixes #1279.
Fixes #1280.

Work completed:

Entry validator now validates all fields
Sense validator now validates all fields
Example sentence validator now validates all fields
Writing system validator created
Semantic domain validator created
Part of speech validator created
Tests for entry validator
Tests for sense validator
Tests for example sentence validator
Tests for writing system validator
Tests for semantic domain validator
Tests for part of speech validator
Adjusted some existing unit tests to now create valid data

Contains validators for: - Entry - Sense - Example Sentence - Part of Speech - Semantic Domain

Entry validation now includes "lexeme must not be empty", so we add a non-empty lexeme to the existing entry validation tests.

These tests are quite similar to each other; a test helper method is probably needed here.

Refactored citation form tests to be more generic so they can be resued for other similar fields.

rmunn · 2025-01-06T21:29:19Z

My thoughts on SemDom.xml and GOLDEtic.xml - we load these XML files as resources into the app (TODO: determine which DLL the resources should live in) and create a singleton service (or just a static class) that parses those XML files at system startup and provides an IDictionary interface for looking up POS / semdom data. (Or a hash set if all we need is to validate GUIDs).

Then validation code can say "Is this a predefined / canonical item?" And if it's supposed to be canonical, ensure the GUID is correct. And optionally, verify that the name and description of the canonical items hasn't been modified.

jasonleenaylor · 2025-01-06T23:38:51Z

GUIDs are all that needs to be used to identify if a POS or SemDom is predefined. Unless you want to support versioning, which you probably don't.

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs

backend/FwLite/MiniLcm/Validators/EntryValidator.cs

backend/FwLite/MiniLcm/Validators/PartOfSpeechIdValidator.cs

These tests run a Send/Receive as part of the test and are too slow to be considered unit tests.

Test that circular references are detected

rmunn · 2025-01-07T20:12:18Z

Commit 3bcd237, which actually turns on validation, makes lots of unit tests fail, because their test data is now considered invalid. I'll check through and see if my validation rules are too strict or if test data needs to be updated. I expect it to be a little of both.

As long as senses only have a PartOfSpeechId in them, it's hard to check that property statelessly because we need to look up the PartOfSpeech in order to determine whether it's predefined (and thus whether its GUID needs to match one of the canonical GUIDs). For now, we'll skip checking part of speech GUIDs until senses have an actual PartOfSpeech reference.

Now, instead of semantic domains always being considered predefined when they come from fwdata, we can now look up their GUIDs in the canonical list and set Predefined correctly. This also makes two failing tests pass.

The Sena3SyncTests are failing because some parts of speech are being created with non-canonical GUIDs. Let's comment this out for now to make the tests pass, then uncomment it once we've investigated where the non-canonical PoS GUIDs are coming from.

rmunn · 2025-01-07T20:59:59Z

Many tests are now failing with the error "Fieldname 'HumanNoOpinionNumber' does not exist." I don't get those failures when running tests locally, and I have no idea where that "HumanNoOpinionNumber" name is coming from. @hahn-kev - Any ideas on this one?

A Sentence property that has no content at all should be allowed. (Still should not have any empty writing systems in a MultiString, of course).

backend/FwLite/MiniLcm/Validators/CanonicalGuidsPartOfSpeech.cs

backend/FwLite/MiniLcm/Validators/CanonicalGuidsSemanticDomain.cs

backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs

backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs

backend/FwLite/MiniLcm.Tests/Validators/ExampleSentenceValidatorTests.cs

backend/FwLite/MiniLcm/Validators/MiniLcmValidators.cs

backend/FwLite/MiniLcm/Validators/PartOfSpeechIdValidator.cs

hahn-kev

left some feedback, it looks good so far.

It looks like that issue with HumanNoOpinionNumber is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.

rmunn · 2025-01-08T04:32:19Z

@hahn-kev wrote:

left some feedback, it looks good so far.

Addressed most review comments in commit c3f0908. The GUID constructor that doesn't parse strings is one I'll tackle tomorrow morning. Everything else is either done or waiting until later (such as renaming PartOfSpeechIdValidator, which I'm waiting on because I think I'll end up removing it) except for #1344 (review). There, I need a bit of help because I think I might be misunderstanding what the properties of ComplexFormComponent mean. (Either that, or I'm misunderstanding what you mean in that comment and I need you to expand on it a little).

It looks like that issue with HumanNoOpinionNumber is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.

Yeah, that might have been a one-off caused by something entirely different. Commit 922f0a0, which only changed the validation rule for example sentences, also made that error go away. So I'll chalk it up to weirdness and ignore it unless it comes back.

backend/FwLite/MiniLcm/Validators/MiniLcmValidators.cs

hahn-kev

looks good to me, there's one nitpick and then the discussion about complex forms pending.

The Components property on a complex form can contain empty GUIDs for the complex form entry ID, because we can infer that (it's the entry we're looking at right now). But it cannot contain an empty GUID for the component entry ID, because that's meaningless: which component is being referenced is the whole point of the Components property.

rmunn · 2025-01-09T15:55:51Z

@hahn-kev - Okay, I think complex forms are sorted out now; I added a couple of comments to the validation to hopefully avoid this confusion in the future. This PR should be ready now.

rmunn added 5 commits January 6, 2025 13:44

Add validation for entries and most entry fields

01a619e

Contains validators for: - Entry - Sense - Example Sentence - Part of Speech - Semantic Domain

Update entry validation tests to pass again

8499ff5

Entry validation now includes "lexeme must not be empty", so we add a non-empty lexeme to the existing entry validation tests.

Add entry validator tests for lexeme, citation form

dcf2aef

These tests are quite similar to each other; a test helper method is probably needed here.

Add entry validation tests for literal meaning, note

4dd4032

Refactored citation form tests to be more generic so they can be resued for other similar fields.

Add validation tests for senses, example sentences

04b8b26

rmunn self-assigned this Jan 6, 2025

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs Outdated Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs Outdated Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm/Validators/EntryValidator.cs Outdated Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm/Validators/EntryValidator.cs Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm/Validators/PartOfSpeechIdValidator.cs Outdated Show resolved Hide resolved

rmunn added 7 commits January 7, 2025 11:31

Address review comments so far

87b2f8b

Add list of canonical GUIDs for parts of speech

7577e83

Add list of canonical GUIDs for semantic domains

98fd92b

Mark FwLiteProjectSync tests as integration tests

68fe485

These tests run a Send/Receive as part of the test and are too slow to be considered unit tests.

Add more entry validation tests

b8e6b30

Test that circular references are detected

Also validate complex form types on updates

379d4d4

Actually use validators in MiniLCM API

3bcd237

rmunn force-pushed the feat/validate-minilcm-types branch from 9b3d43e to 3bcd237 Compare January 7, 2025 20:11

rmunn added 6 commits January 7, 2025 15:19

Adjust some test data to make it valid

1cedbec

Fix test failures around semantic domain IDs

e5b02b7

Now, instead of semantic domains always being considered predefined when they come from fwdata, we can now look up their GUIDs in the canonical list and set Predefined correctly. This also makes two failing tests pass.

Push two missing files

d046bdc

Make EntryReadyForCreation create valid data

8d5a2fe

rmunn marked this pull request as ready for review January 7, 2025 21:01

Example sentences may have empty Sentence fields

922f0a0

A Sentence property that has no content at all should be allowed. (Still should not have any empty writing systems in a MultiString, of course).