Skip to content

Commit

Permalink
CLDR-16629 Account for macroregions in likely subtags (#2913) (#2975)
Browse files Browse the repository at this point in the history
* CLDR-16629 Account for macroregions in likely subtags

* CLDR-16629 Fix wording

(cherry picked from commit 5f36fda)

Co-authored-by: Mark Davis <[email protected]>
  • Loading branch information
pedberg-icu and macchiati authored May 29, 2023
1 parent 465ace6 commit 1b5cfb1
Showing 1 changed file with 9 additions and 5 deletions.
14 changes: 9 additions & 5 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -2165,8 +2165,7 @@ To look up data in the table, see if a locale matches one of the `from` attribut
So looking up "zh_TW" returns "zh_Hant_TW", while looking up "zh" returns "zh_Hans_CN".

In more detail, the data is designed to be used in the following operations.

Note that as of CLDR v24, any field present in the 'from' field is also present in the 'to' field, so an input field will not change in "Add Likely Subtags" operation. The data and operations can also be used with language tags using [[BCP47](#BCP47)] syntax, with the appropriate changes. In addition, certain common 'denormalized' language subtags such as 'iw' (for 'he') may occur in both the 'from' and 'to' fields. This allows for implementations that use those denormalized subtags to use the data with only minor changes to the operations.
Like other CLDR operations, these operations can also be used with language tags having [[BCP47](#BCP47)] syntax, with the appropriate changes to the data.

An implementation may choose to exclude language tags with the language subtag "und" from the following operation. In such a case, only the canonicalization is done. An implementation can declare that it is doing the exclusion, or can take a parameter that controls whether or not to do it.

Expand All @@ -2193,7 +2192,7 @@ This operation is performed in the following way.
1. an error value, or
2. the match for "und" (in APIs where a valid language tag is required).
2. Otherwise there is a match = _language<sub>m</sub>\_script<sub>m</sub>\_region<sub>m</sub>_
3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is not empty, and x<sub>m</sub> otherwise.
3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is not empty or x<sub>s</sub> is a macroregion, and x<sub>m</sub> otherwise.
4. Return the language tag composed of _language<sub>r</sub>\_script<sub>r</sub>\_region<sub>r</sub>_ + variants + extensions.

The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.
Expand All @@ -2207,13 +2206,18 @@ _Example1:_

To find the most likely language for a country, or language for a script, use "und" as the language subtag. For example, looking up "und_TW" returns zh_Hant_TW.

A goal of the algorithm is that if X ⇒ Y, and X' results from replacing an empty subtag in X by the corresponding subtag in Y, then X' ⇒ Y. For example, if und_AF ⇒ fa_Arab_AF, then:
A general goal of the algorithm is that non-empty field present in the 'from' field is also present in the 'to' field, so a non-empty input field will not change in "Add Likely Subtags" operation.
That is, when X ⇒ Y, and X' results from replacing an empty subtag in X by the corresponding subtag in Y, then X' ⇒ Y.
For example, if und_AF ⇒ fa_Arab_AF, then:

* fa_Arab_AF ⇒ fa_Arab_AF
* und_Arab_AF ⇒ fa_Arab_AF
* fa_AF ⇒ fa_Arab_AF

There are a small number of exceptions to this goal in the current data, where X ∈ {und_Bopo, und_Brai, und_Cakm, und_Limb, und_Shaw}.
There are a few exceptions to this goal:
* A 'denormalized' subtag changes to the normalized form, except for certain denormalized language subtags such as 'iw' (for 'he' = Hebrew) which may occur in both the 'from' and 'to' fields of the data.
This allows for implementations that use those denormalized subtags to use the data with only minor changes to the operations.
* A macroregion (such as West Africa = 011) may change to a specific country (Nigeria = NG).

**_Remove_** _**Likely Subtags:** Given a locale, remove any fields that Add Likely Subtags would add._

Expand Down

0 comments on commit 1b5cfb1

Please sign in to comment.