Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate MiniLcm types #1344

Merged
merged 25 commits into from
Jan 10, 2025
Merged

Validate MiniLcm types #1344

merged 25 commits into from
Jan 10, 2025

Conversation

rmunn
Copy link
Contributor

@rmunn rmunn commented Jan 6, 2025

Fixes #1275.
Fixes #1276.
Fixes #1277.
Fixes #1278.
Fixes #1279.
Fixes #1280.

Work completed:

  • Entry validator now validates all fields
  • Sense validator now validates all fields
  • Example sentence validator now validates all fields
  • Writing system validator created
  • Semantic domain validator created
  • Part of speech validator created
  • Tests for entry validator
  • Tests for sense validator
  • Tests for example sentence validator
  • Tests for writing system validator
  • Tests for semantic domain validator
  • Tests for part of speech validator
  • Adjusted some existing unit tests to now create valid data

rmunn added 5 commits January 6, 2025 13:44
Contains validators for:

- Entry
- Sense
- Example Sentence
- Part of Speech
- Semantic Domain
Entry validation now includes "lexeme must not be empty", so we add a
non-empty lexeme to the existing entry validation tests.
These tests are quite similar to each other; a test helper method is
probably needed here.
Refactored citation form tests to be more generic so they can be resued
for other similar fields.
@rmunn rmunn self-assigned this Jan 6, 2025
@rmunn
Copy link
Contributor Author

rmunn commented Jan 6, 2025

My thoughts on SemDom.xml and GOLDEtic.xml - we load these XML files as resources into the app (TODO: determine which DLL the resources should live in) and create a singleton service (or just a static class) that parses those XML files at system startup and provides an IDictionary interface for looking up POS / semdom data. (Or a hash set if all we need is to validate GUIDs).

Then validation code can say "Is this a predefined / canonical item?" And if it's supposed to be canonical, ensure the GUID is correct. And optionally, verify that the name and description of the canonical items hasn't been modified.

@jasonleenaylor
Copy link

GUIDs are all that needs to be used to identify if a POS or SemDom is predefined. Unless you want to support versioning, which you probably don't.

@rmunn rmunn force-pushed the feat/validate-minilcm-types branch from 9b3d43e to 3bcd237 Compare January 7, 2025 20:11
@rmunn
Copy link
Contributor Author

rmunn commented Jan 7, 2025

Commit 3bcd237, which actually turns on validation, makes lots of unit tests fail, because their test data is now considered invalid. I'll check through and see if my validation rules are too strict or if test data needs to be updated. I expect it to be a little of both.

rmunn added 6 commits January 7, 2025 15:19
As long as senses only have a PartOfSpeechId in them, it's hard to check
that property statelessly because we need to look up the PartOfSpeech in
order to determine whether it's predefined (and thus whether its GUID
needs to match one of the canonical GUIDs). For now, we'll skip checking
part of speech GUIDs until senses have an actual PartOfSpeech reference.
Now, instead of semantic domains always being considered predefined when
they come from fwdata, we can now look up their GUIDs in the canonical
list and set Predefined correctly. This also makes two failing tests pass.
The Sena3SyncTests are failing because some parts of speech are being
created with non-canonical GUIDs. Let's comment this out for now to make
the tests pass, then uncomment it once we've investigated where the
non-canonical PoS GUIDs are coming from.
@rmunn
Copy link
Contributor Author

rmunn commented Jan 7, 2025

Many tests are now failing with the error "Fieldname 'HumanNoOpinionNumber' does not exist." I don't get those failures when running tests locally, and I have no idea where that "HumanNoOpinionNumber" name is coming from. @hahn-kev - Any ideas on this one?

@rmunn rmunn marked this pull request as ready for review January 7, 2025 21:01
A Sentence property that has no content at all should be allowed. (Still
should not have any empty writing systems in a MultiString, of course).
Copy link
Collaborator

@hahn-kev hahn-kev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some feedback, it looks good so far.

It looks like that issue with HumanNoOpinionNumber is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.

@rmunn
Copy link
Contributor Author

rmunn commented Jan 8, 2025

@hahn-kev wrote:

left some feedback, it looks good so far.

Addressed most review comments in commit c3f0908. The GUID constructor that doesn't parse strings is one I'll tackle tomorrow morning. Everything else is either done or waiting until later (such as renaming PartOfSpeechIdValidator, which I'm waiting on because I think I'll end up removing it) except for #1344 (review). There, I need a bit of help because I think I might be misunderstanding what the properties of ComplexFormComponent mean. (Either that, or I'm misunderstanding what you mean in that comment and I need you to expand on it a little).

It looks like that issue with HumanNoOpinionNumber is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.

Yeah, that might have been a one-off caused by something entirely different. Commit 922f0a0, which only changed the validation rule for example sentences, also made that error go away. So I'll chalk it up to weirdness and ignore it unless it comes back.

Copy link
Collaborator

@hahn-kev hahn-kev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, there's one nitpick and then the discussion about complex forms pending.

rmunn added 2 commits January 9, 2025 10:13
The Components property on a complex form can contain empty GUIDs for
the complex form entry ID, because we can infer that (it's the entry
we're looking at right now). But it cannot contain an empty GUID for the
component entry ID, because that's meaningless: which component is being
referenced is the whole point of the Components property.
@rmunn rmunn requested a review from hahn-kev January 9, 2025 15:55
@rmunn
Copy link
Contributor Author

rmunn commented Jan 9, 2025

@hahn-kev - Okay, I think complex forms are sorted out now; I added a couple of comments to the validation to hopefully avoid this confusion in the future. This PR should be ready now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants