Skip to content

Commit

Permalink
CLDR-8038 Fix links in several files in docs/site (#4131)
Browse files Browse the repository at this point in the history
  • Loading branch information
btangmu authored Oct 16, 2024
1 parent 62f93db commit 64cac16
Show file tree
Hide file tree
Showing 6 changed files with 42 additions and 44 deletions.
18 changes: 9 additions & 9 deletions docs/site/index/cldr-spec/collation-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ title: Collation Guidelines

Collation sequences can be quite tricky to specify.

The locale\-based collation rules in Unicode CLDR specify customizations of the standard data for [UTS \#10: Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/#Introduction) (UCA). Requests to change the collation order for a given locale, or to supply additional variants, need to follow the guidelines in this document.
The locale\-based collation rules in Unicode CLDR specify customizations of the standard data for [UTS \#10: Unicode Collation Algorithm](https://www.unicode.org/reports/tr10/#Introduction) (UCA). Requests to change the collation order for a given locale, or to supply additional variants, need to follow the guidelines in this document.

## Filing a Request

Requests to change the collation order for a given locale, or to supply additional variants should be filed as CLDR bug tickets. See [CLDR Change Requests](/index/bug-reports)
Requests to change the collation order for a given locale, or to supply additional variants should be reported by [requesting changes](/requesting_changes).

### Rules

The request should present the precise change expressed as rules. The rules must be supplied in the syntax as specified in [http://www.unicode.org/reports/tr35/tr35\-collation.html\#Rules](http://www.unicode.org/reports/tr35/tr35-collation.html#Rules). (This used to be called the "basic syntax".) The rules must also be [Minimal Rules](/index/cldr-spec/collation-guidelines) as described below: *only* differences from [http://unicode.org/charts/uca/](http://unicode.org/charts/uca/) should be specified.
The request should present the precise change expressed as rules. The rules must be supplied in the syntax as specified in [https://www.unicode.org/reports/tr35/tr35\-collation.html\#Rules](https://www.unicode.org/reports/tr35/tr35-collation.html#Rules). (This used to be called the "basic syntax".) The rules must also be [Minimal Rules](#minimal-rules) as described below: *only* differences from [https://www.unicode.org/charts/collation](https://www.unicode.org/charts/collation/) should be specified.

*\& c \< cs*

Expand Down Expand Up @@ -52,15 +52,15 @@ Provide justification for your change. Citations should be to authoritative page

Please test out any suggested rules before filing a bug.

1. Go to the [ICU Collation Demo](http://demo.icu-project.org/icu-bin/collation.html).
1. Go to the [ICU Collation Demo](https://demo.icu-project.org/icu-bin/collation.html).
2. Pick the language for which you want to change the rules, or keep it on "und" (root) if you want to start from the Unicode/CLDR default sort order.
3. Put your rules into the "Append rules" box.
4. Put an interesting list of strings into the Input box.
5. Click "sort" and verify the sort order and levels of differences.

Or

1. Go to the [ICU Locale Explorer](http://demo.icu-project.org/icu-bin/locexp).
1. Go to the [ICU Locale Explorer](https://demo.icu-project.org/icu-bin/locexp).
2. Pick the appropriate locale.
3. Follow the instructions at the bottom to use your suggested rules on your suggested test data.
4. Verify that the proper order results.
Expand All @@ -71,7 +71,7 @@ The exact collation sequence for a given language may be difficult to determine.

Most standards that specify collation, such as DIN or CS, are not targeted at algorithmic sorting, and are not complete algorithmic specifications. For example, CSN 97 6030 requires transliteration of foreign scripts, but there are many choices as to how to transliterate, and the exact mechanism is not specified. It also specifies that geometric shapes are sorted by the number of vertices and edges, which is, at a minimum, difficult to determine; and are subject to variation in glyphs.

The CLDR goals are to match the sorting of exemplar letters and common punctuation and leave everything else to the standard UCA ordering. For more information, see [UTS \#10: Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/#Introduction) (UCA).
The CLDR goals are to match the sorting of exemplar letters and common punctuation and leave everything else to the standard UCA ordering. For more information, see [UTS \#10: Unicode Collation Algorithm](https://www.unicode.org/reports/tr10/#Introduction) (UCA).

### Determining Level Differences

Expand Down Expand Up @@ -192,7 +192,7 @@ It would be possible instead to have rules that list every letter used by Slovak
1. Every time a character is tailored, the data for that character takes up more room in typical implementations. That means that the data for collation is larger, downloads of collation libraries with that data are slower, sort keys are longer, and performance is slower; sometimes very much so.
2. Related characters in the same script are in a peculiar order. For example, if the Slovak tailoring omits ƀ, then it would show up as after z.

You can see what the UCA currently does with a given script by looking at the charts at [Unicode Collation Charts](http://www.unicode.org/charts/collation/), or at the [UCA in ICU\-style rules](http://unicode.org/cldr/data/diff/collation/UCA.txt). For example, suppose that U\+0D89 SINHALA LETTER IYANNA and U\+0D8A SINHALA LETTER IIYANNA needed to come after U\+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:
You can see what the UCA currently does with a given script by looking at the charts at [Unicode Collation Charts](https://www.unicode.org/charts/collation/). For example, suppose that U\+0D89 SINHALA LETTER IYANNA and U\+0D8A SINHALA LETTER IIYANNA needed to come after U\+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:

\&\# U\+0D96 SINHALA LETTER AUYANNA

Expand Down Expand Up @@ -242,6 +242,6 @@ There are a number of pitfalls with collation, so be careful. In some cases, suc
6. The correct rules should be the minimal ones.
7. \& \[before 1] c \< ċ \<\<\< Ċ
8. This finds the highest primary (that's what the 1 is for) character less than c, and uses that as the reset point. For Maltese, the same technique needs to be used for ġ and ż.
2. **Blocking Contractions.** Contractions can be blocked with CGJ, as described in the Unicode Standard and in the [Characters and Combining Marks FAQ](http://www.unicode.org/faq/char_combmark.html).
3. **Case Combinations.** The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if *ch* is a contraction, then you would have the rules `... ch <<< Ch <<< CH`. Other case variants such as *cH* are excluded because they are unlikely to represent the contraction, for example in *McHugh*. (Therefore, *mchugh* and *McHugh* will be primary different if *ch* adds a primary difference.) \[[\#8248](http://unicode.org/cldr/trac/ticket/8248)]
2. **Blocking Contractions.** Contractions can be blocked with CGJ, as described in the Unicode Standard and in the [Characters and Combining Marks FAQ](https://www.unicode.org/faq/char_combmark.html).
3. **Case Combinations.** The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if *ch* is a contraction, then you would have the rules `... ch <<< Ch <<< CH`. Other case variants such as *cH* are excluded because they are unlikely to represent the contraction, for example in *McHugh*. (Therefore, *mchugh* and *McHugh* will be primary different if *ch* adds a primary difference.) \[[\#8248](https://unicode.org/cldr/trac/ticket/8248)]

6 changes: 3 additions & 3 deletions docs/site/index/cldr-spec/plural-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ These categories are used to provide localized units, with a more natural ways o

## Reporting Defects

When you find errors or omissions in this data, please report the information with a [bug report](/index/bug-reports#TOC-Filing-a-Ticket). Please give examples of how the forms may differ. You don't have to give the exact rules, but it is extremely helpful! Here's an example:  
When you find errors or omissions in this data, please report the information by [filing a ticket](/requesting_changes#how-to-file-a-ticket). Please give examples of how the forms may differ. You don't have to give the exact rules, but it is extremely helpful! Here's an example:  

**Sample Bug Report**

Expand Down Expand Up @@ -172,7 +172,7 @@ In some sense, the names for the categories are somewhat arbitrary. Yet for cons
- If there needs to be a category for items only have fractional values, use '**many**'
8. If there are more categories needed for the language, describe what those categories need to cover in the bug report.

See [*Language Plural Rules*](http://www.unicode.org/cldr/data/charts/supplemental/language_plural_rules.html) for examples of rules, such as for [Czech](https://www.unicode.org/cldr/charts/45/supplemental/language_plural_rules.html#cs), and for [comparisons of values](https://www.unicode.org/cldr/charts/45/supplemental/language_plural_rules.html#cs-comp). Note that in the integer comparison chart, most languages have 'x' (other—gray) for most integers. There are some exceptions (Russian and Arabic, for example), where the categories of 'many' and 'other' should have been swapped when they were defined, but are too late now to change.
See [*Language Plural Rules*](https://www.unicode.org/cldr/data/charts/supplemental/language_plural_rules.html) for examples of rules, such as for [Czech](https://www.unicode.org/cldr/charts/45/supplemental/language_plural_rules.html#cs), and for [comparisons of values](https://www.unicode.org/cldr/charts/45/supplemental/language_plural_rules.html#cs-comp). Note that in the integer comparison chart, most languages have 'x' (other—gray) for most integers. There are some exceptions (Russian and Arabic, for example), where the categories of 'many' and 'other' should have been swapped when they were defined, but are too late now to change.

## Important Notes

Expand All @@ -188,7 +188,7 @@ If you were to substitute a different number for "1" in a sentence or phrase, wo

## Plural Rule Syntax

See [LDML Language Plural Rules](http://unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules).
See [LDML Language Plural Rules](https://unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules).

## Plural Message Migration

Expand Down
2 changes: 1 addition & 1 deletion docs/site/index/draft-schedule.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ Two releases are regularly scheduled each year, currently one in mid-October, an
The March release is intended to be less data-intensive, and sometimes involves little or no vetting.

The current release is found at [Current CLDR Dev Version](https://docs.google.com/spreadsheets/d/1N6inI5R84UoYlRwuCNPBOAP7ri4q2CmJmh8DC5g-S6c/edit?gid=1680747936#gid=1680747936).
For more information about the various phases of the release, see [Survey Tool stages](../translation/getting-started/survey-tool-phases)
For more information about the various phases of the release, see [Survey Tool stages](/translation/getting-started/survey-tool-phases)
2 changes: 1 addition & 1 deletion docs/site/index/locale-coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: Locale Coverage Special Data

## Missing Features

The following may be listed as Missing features in a Locale Coverage chart, such as [v43 Locale Coverage](https://unicode-org.github.io/cldr-staging/charts/43/supplemental/locale_coverage.html).
The following may be listed as Missing features in a Locale Coverage chart, such as [v45 Locale Coverage](https://www.unicode.org/cldr/charts/45/supplemental/locale_coverage.html).

### Core

Expand Down
Loading

0 comments on commit 64cac16

Please sign in to comment.