-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-17014 No code fallbacks for language paths #4254
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these are all constant paths, they can be statically initiated once, then just added whenever needed.
I'd suggest a separate class, ExtraPaths, with one method .append to(Collection)
Yes, that sounds good, in fact I was thinking to follow this up with a refactoring PR, to move the extra-path code (not just for languages, but for all extra paths) out of CLDRFile into its own top-level class. It's still not clear to me what part, if any, of extra-path code is locale-specific; the current methods aren't static. It does seem wasteful to re-run code for each locale (each CLDRFile) if it's not locale-specific. Stepping back, though, does the general solution seem right, to use extra paths instead of code fallback for language paths? There are some test failures, like this, that I still need to study:
|
Everything in code-fallback is locale independent and immutable. So that's why you can move it all to the new class, and fetch it with no local parameter. We might be able to just call that when building the root cldrfile. The other stuff in get extra paths is locale dependent. That could also be moved to the new class, but needs a locale parameter when fetched. |
Yes, with the changes I mention. |
The second commit fixes some but not all test failures. The remaining test failures are TestCoverageCompleteness and testLSR, both in TestCoverageLevel. These need more investigation. After the tests are all passing, I'll address refactoring/optimizing with a new class ExtraPaths. |
private void getLanguageExtraPaths(Set<String> toAddTo) { | ||
Set<String> codes = | ||
StandardCodes.make().getSurveyToolDisplayCodes(NameType.LANGUAGE.getNameName()); | ||
codes.remove(XMLSource.ROOT_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong, my mistake -- it actually modifies the set in StandardCodes! Need to make a copy of the set. We might want to prevent this kind of bug by making getSurveyToolDisplayCodes return an unmodifiable set, maybe using Set.copyOf.
We currently have this, which doesn't work: CLDRLanguageCodes = CldrUtility.protectCollection(CLDRLanguageCodes);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by my last commit -- we still should look into Set.copyOf as an alternative to protectCollection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: In general, use ImmutableSet.copyOf instead of Set.copyOf. The latter changes the order in the set, which may be important (TreeSet or LinkedHashSets get messed up, and it doesn't hurt for HashSet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we need to fix CldrUtility.protectCollection(CLDRLanguageCodes); — if it does't work then a lot of other things could go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1073,7 +1075,7 @@ public void testLSR() { | |||
|
|||
// Get root LSR codes | |||
|
|||
for (String path : root) { | |||
for (String path : root.fullIterable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without fullIterable, the loop skipped the extra paths
this may be a concern in general: how often do we loop through a CLDRFile the simple way, which skips extra paths, and what are the consequences and rationale for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I think in most cases we want to iterate through the extra paths. So we might want make the iteration choice explicit, so that we can check to make sure. Maybe refactor (in separate ticket/PR) as follows.
for (String path : cldrFile) { // make this fail
for (String path : cldrFile.withoutExtras()) { // make this do what the line above did
for (String path : cldrFile.withExtras()) { // new name for fullIterable
Then we can search for the withoutExtras() calls and make sure that they are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TestVettingDataDriven failure may be related to this. Part of that test involves writing the CLDRFile to a temporary XML file and then reading it back to verify the newly voted-on value, and that's where it's failing. It looks like the language paths, which are now extra paths, don't get written to the file. To test that, I'm starting a new branch from main that will just have a few lines added to TestSTFactory.xml...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #4256
If the same failure happens there that I see locally, it seems to indicate a long-existing bug where CldrXmlWriter doesn't write extra paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@macchiati following up, I made a new ticket about fullIterable: https://unicode-org.atlassian.net/browse/CLDR-18217
The test failures all seem fixed now except for TestVettingDataDriven |
Well, it mustn't write the extra paths, because the have null values, which
cannot be serialized.
With the changes to code-fallback to not have values, there is a difference
in behavior. No surprise that that will require changes.
…On Tue, Dec 24, 2024, 19:13 Tom Bishop ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestCoverageLevel.java
<#4254 (comment)>:
> @@ -1073,7 +1075,7 @@ public void testLSR() {
// Get root LSR codes
- for (String path : root) {
+ for (String path : root.fullIterable()) {
See #4256 <#4256>
If the same failure happens there that I see locally, it seems to indicate
a long-existing bug where CldrXmlWriter doesn't write extra paths
—
Reply to this email directly, view it on GitHub
<#4254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMEZ56VKN27SAOLQSYT2HGP27AVCNFSM6AAAAABUAHVCSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKMRSGE2TSMBXGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I wouldn't expect extra paths with null values to be written to disk, but I did expect extra paths with non-null values to be written to disk. The test failures here and in #4256 show that an extra path, originally having a null value (or no value), can get a value through voting, and the CLDRFile is updated successfully in memory, but it fails to get written to disk. The failure is happening in #4256 even though there is no change in that PR except for the test itself -- it's for a metazone path and doesn't involve any change to language paths, code fallback, etc. The path |
Hmmm. Yes, the extra paths are no longer 'extra' once they have values. Not
sure why they are failing
to get written to disk.
There are plenty of other cases in the existing code that worked fine in
the past.
Guess the only recourse is to walk through what's happening in a debugger.
…On Wed, Dec 25, 2024, 19:41 Tom Bishop ***@***.***> wrote:
it mustn't write the extra paths, because the have null values, which
cannot be serialized
I wouldn't expect extra paths with null values to be written to disk, but
I did expect extra paths with non-null values to be written to disk. The
test failures here and in #4256
<#4256> show that an empty path,
originally having a null value (or no value), can get a value through
voting, and the CLDRFile is updated successfully in memory, but it fails to
get written to disk. The failure is happening in #4256
<#4256> even though there is no
change in that PR except for the test itself -- it's for a metazone path
and doesn't involve any change to language paths, code fallback, etc.
The path ***@***.***="Alaska"]/long/generic
does have a value in common/main/fr.xml for example. Maybe it's no longer
called an "extra path" at that point; maybe part of my confusion is about
terminology. Anyway I'm wondering how it got into fr.xml if not by voting
followed by CldrXmlWriter.write.
—
Reply to this email directly, view it on GitHub
<#4254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFPDGWMH6NBW4WM55D2HL35HAVCNFSM6AAAAABUAHVCSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRRHE3TGNZZGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
-Use the extra-paths mechanism instead of code-fallback for language paths
-Change es-419 to es (419) in localeDisplayName.txt so TestLocaleDisplay passes -Add language path in testExtraPaths so it passes -Remove XMLSource.CODE_FALLBACK_ID, GERMAN, from testGetPaths so it passes
-Create the set of language extra paths only once; related refactoring -Fix another part of testGetPaths so it passes, similar to previous commit -Comments
-Do not modify SupplementalDataInfo.CLDRLanguageCodes, returned by sc.getGoodAvailableCodes!
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
All tests pass here after merging the other PR #4256 and rebasing. Unless unexpected problems turn up, this PR should complete the ticket, except that we want to refactor by moving more "extra paths" code from CLDRFile.java to ExtraPaths.java. Also, we should keep eyes open for more lurking bugs related to extra paths, since now there will be more extra paths, and ideally we should soon implement https://unicode-org.atlassian.net/browse/CLDR-18217 |
-Use the extra-paths mechanism instead of code-fallback for language paths
CLDR-17014
ALLOW_MANY_COMMITS=true