Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-17014 No code fallbacks for language paths #4254

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

btangmu
Copy link
Member

@btangmu btangmu commented Dec 21, 2024

-Use the extra-paths mechanism instead of code-fallback for language paths

CLDR-17014

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@btangmu btangmu self-assigned this Dec 21, 2024
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these are all constant paths, they can be statically initiated once, then just added whenever needed.

I'd suggest a separate class, ExtraPaths, with one method .append to(Collection)

@btangmu
Copy link
Member Author

btangmu commented Dec 21, 2024

a separate class, ExtraPaths

Yes, that sounds good, in fact I was thinking to follow this up with a refactoring PR, to move the extra-path code (not just for languages, but for all extra paths) out of CLDRFile into its own top-level class. It's still not clear to me what part, if any, of extra-path code is locale-specific; the current methods aren't static. It does seem wasteful to re-run code for each locale (each CLDRFile) if it's not locale-specific. Stepping back, though, does the general solution seem right, to use extra paths instead of code fallback for language paths?

There are some test failures, like this, that I still need to study:

  TestCoverageLevel {
    ...
    TestCoverageCompleteness
Error:  (TestCoverageLevel.java:728)  Error: Comprehensive & no exception for path =>	//ldml/localeDisplayNames/languages/language[@type="fa_AF"]

@macchiati
Copy link
Member

Everything in code-fallback is locale independent and immutable. So that's why you can move it all to the new class, and fetch it with no local parameter. We might be able to just call that when building the root cldrfile.

The other stuff in get extra paths is locale dependent. That could also be moved to the new class, but needs a locale parameter when fetched.

@macchiati
Copy link
Member

Stepping back, though, does the general solution seem right, to use extra paths instead of code fallback for language paths?

Yes, with the changes I mention.

@btangmu
Copy link
Member Author

btangmu commented Dec 22, 2024

The second commit fixes some but not all test failures.

The remaining test failures are TestCoverageCompleteness and testLSR, both in TestCoverageLevel. These need more investigation.

After the tests are all passing, I'll address refactoring/optimizing with a new class ExtraPaths.

private void getLanguageExtraPaths(Set<String> toAddTo) {
Set<String> codes =
StandardCodes.make().getSurveyToolDisplayCodes(NameType.LANGUAGE.getNameName());
codes.remove(XMLSource.ROOT_ID);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong, my mistake -- it actually modifies the set in StandardCodes! Need to make a copy of the set. We might want to prevent this kind of bug by making getSurveyToolDisplayCodes return an unmodifiable set, maybe using Set.copyOf.

We currently have this, which doesn't work: CLDRLanguageCodes = CldrUtility.protectCollection(CLDRLanguageCodes);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by my last commit -- we still should look into Set.copyOf as an alternative to protectCollection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: In general, use ImmutableSet.copyOf instead of Set.copyOf. The latter changes the order in the set, which may be important (TreeSet or LinkedHashSets get messed up, and it doesn't hurt for HashSet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to fix CldrUtility.protectCollection(CLDRLanguageCodes); — if it does't work then a lot of other things could go wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1073,7 +1075,7 @@ public void testLSR() {

// Get root LSR codes

for (String path : root) {
for (String path : root.fullIterable()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without fullIterable, the loop skipped the extra paths

this may be a concern in general: how often do we loop through a CLDRFile the simple way, which skips extra paths, and what are the consequences and rationale for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I think in most cases we want to iterate through the extra paths. So we might want make the iteration choice explicit, so that we can check to make sure. Maybe refactor (in separate ticket/PR) as follows.

for (String path : cldrFile) { // make this fail

for (String path : cldrFile.withoutExtras()) { // make this do what the line above did

for (String path : cldrFile.withExtras()) { // new name for fullIterable

Then we can search for the withoutExtras() calls and make sure that they are right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestVettingDataDriven failure may be related to this. Part of that test involves writing the CLDRFile to a temporary XML file and then reading it back to verify the newly voted-on value, and that's where it's failing. It looks like the language paths, which are now extra paths, don't get written to the file. To test that, I'm starting a new branch from main that will just have a few lines added to TestSTFactory.xml...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #4256

If the same failure happens there that I see locally, it seems to indicate a long-existing bug where CldrXmlWriter doesn't write extra paths

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati following up, I made a new ticket about fullIterable: https://unicode-org.atlassian.net/browse/CLDR-18217

@btangmu
Copy link
Member Author

btangmu commented Dec 24, 2024

The test failures all seem fixed now except for TestVettingDataDriven

@macchiati
Copy link
Member

macchiati commented Dec 25, 2024 via email

@btangmu
Copy link
Member Author

btangmu commented Dec 25, 2024

it mustn't write the extra paths, because the have null values, which cannot be serialized

I wouldn't expect extra paths with null values to be written to disk, but I did expect extra paths with non-null values to be written to disk. The test failures here and in #4256 show that an extra path, originally having a null value (or no value), can get a value through voting, and the CLDRFile is updated successfully in memory, but it fails to get written to disk. The failure is happening in #4256 even though there is no change in that PR except for the test itself -- it's for a metazone path and doesn't involve any change to language paths, code fallback, etc.

The path //ldml/dates/timeZoneNames/metazone[@type="Alaska"]/long/generic does have a value in common/main/fr.xml for example. Maybe it's no longer called an "extra path" at that point; maybe part of my confusion is about terminology. Anyway I'm wondering how it got into fr.xml if not by voting followed by CldrXmlWriter.write.

@macchiati
Copy link
Member

macchiati commented Dec 26, 2024 via email

-Use the extra-paths mechanism instead of code-fallback for language paths
-Change es-419 to es (419) in localeDisplayName.txt so TestLocaleDisplay passes

-Add language path in testExtraPaths so it passes

-Remove XMLSource.CODE_FALLBACK_ID, GERMAN, from testGetPaths so it passes
-Create the set of language extra paths only once; related refactoring

-Fix another part of testGetPaths so it passes, similar to previous commit

-Comments
-Do not modify SupplementalDataInfo.CLDRLanguageCodes, returned by sc.getGoodAvailableCodes!
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@btangmu btangmu marked this pull request as ready for review January 13, 2025 18:49
@btangmu btangmu requested a review from macchiati January 13, 2025 18:49
@btangmu
Copy link
Member Author

btangmu commented Jan 13, 2025

All tests pass here after merging the other PR #4256 and rebasing.

Unless unexpected problems turn up, this PR should complete the ticket, except that we want to refactor by moving more "extra paths" code from CLDRFile.java to ExtraPaths.java.

Also, we should keep eyes open for more lurking bugs related to extra paths, since now there will be more extra paths, and ideally we should soon implement https://unicode-org.atlassian.net/browse/CLDR-18217

@btangmu btangmu requested review from srl295 and pedberg-icu January 13, 2025 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants