From 1b5cfb132e50b1efcf6c1fd09584a34a4621facb Mon Sep 17 00:00:00 2001 From: Peter Edberg <42151464+pedberg-icu@users.noreply.github.com> Date: Mon, 29 May 2023 11:41:18 -0700 Subject: [PATCH] CLDR-16629 Account for macroregions in likely subtags (#2913) (#2975) * CLDR-16629 Account for macroregions in likely subtags * CLDR-16629 Fix wording (cherry picked from commit 5f36fda0dea159aa5856a7d27d019062693df312) Co-authored-by: Mark Davis --- docs/ldml/tr35.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index 28bb48a0e05..3d11bf04ebe 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -2165,8 +2165,7 @@ To look up data in the table, see if a locale matches one of the `from` attribut So looking up "zh_TW" returns "zh_Hant_TW", while looking up "zh" returns "zh_Hans_CN". In more detail, the data is designed to be used in the following operations. - -Note that as of CLDR v24, any field present in the 'from' field is also present in the 'to' field, so an input field will not change in "Add Likely Subtags" operation. The data and operations can also be used with language tags using [[BCP47](#BCP47)] syntax, with the appropriate changes. In addition, certain common 'denormalized' language subtags such as 'iw' (for 'he') may occur in both the 'from' and 'to' fields. This allows for implementations that use those denormalized subtags to use the data with only minor changes to the operations. +Like other CLDR operations, these operations can also be used with language tags having [[BCP47](#BCP47)] syntax, with the appropriate changes to the data. An implementation may choose to exclude language tags with the language subtag "und" from the following operation. In such a case, only the canonicalization is done. An implementation can declare that it is doing the exclusion, or can take a parameter that controls whether or not to do it. @@ -2193,7 +2192,7 @@ This operation is performed in the following way. 1. an error value, or 2. the match for "und" (in APIs where a valid language tag is required). 2. Otherwise there is a match = _languagem\_scriptm\_regionm_ - 3. Let xr = xs if xs is not empty, and xm otherwise. + 3. Let xr = xs if xs is not empty or xs is a macroregion, and xm otherwise. 4. Return the language tag composed of _languager\_scriptr\_regionr_ + variants + extensions. The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested. @@ -2207,13 +2206,18 @@ _Example1:_ To find the most likely language for a country, or language for a script, use "und" as the language subtag. For example, looking up "und_TW" returns zh_Hant_TW. -A goal of the algorithm is that if X ⇒ Y, and X' results from replacing an empty subtag in X by the corresponding subtag in Y, then X' ⇒ Y. For example, if und_AF ⇒ fa_Arab_AF, then: +A general goal of the algorithm is that non-empty field present in the 'from' field is also present in the 'to' field, so a non-empty input field will not change in "Add Likely Subtags" operation. +That is, when X ⇒ Y, and X' results from replacing an empty subtag in X by the corresponding subtag in Y, then X' ⇒ Y. +For example, if und_AF ⇒ fa_Arab_AF, then: * fa_Arab_AF ⇒ fa_Arab_AF * und_Arab_AF ⇒ fa_Arab_AF * fa_AF ⇒ fa_Arab_AF -There are a small number of exceptions to this goal in the current data, where X ∈ {und_Bopo, und_Brai, und_Cakm, und_Limb, und_Shaw}. +There are a few exceptions to this goal: +* A 'denormalized' subtag changes to the normalized form, except for certain denormalized language subtags such as 'iw' (for 'he' = Hebrew) which may occur in both the 'from' and 'to' fields of the data. +This allows for implementations that use those denormalized subtags to use the data with only minor changes to the operations. +* A macroregion (such as West Africa = 011) may change to a specific country (Nigeria = NG). **_Remove_** _**Likely Subtags:** Given a locale, remove any fields that Add Likely Subtags would add._