-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
U+034F COMBINING GRAPHEME JOINER not working with Windows fonts #1188
Comments
So this statement is evidently wrong when those other processes are glyph processes rather than character processes. Inserting CGJ in a mark sequence does cause problems for glyph processing, and it isn’t clear to me whether this is something that needs to be addressed at the font level or at the shaping engine level. Unicode effectively places no limitations on where CGJ can be inserted in a character string, or even how many instances of CGJ could be inserted in a character string. There are places where one might anticipate it being used to affect sorting or to prevent canonical composition or normalisation reordering, but it could occur anywhere and potentially disrupt both GSUB and GPOS operations. To solve this at the font level requires every font to make accomodation for filtering CGJ in every lookup. So, for example, if you want a ligature substitution to occur, you need to use a mark filter set that excludes CGJ. If you want marks to be positioned relative to a base or to other marks, you need to use a mark filter set that excludes CGJ. This isn’t an issue of something ‘not working with Windows fonts’: I am not aware of any fonts that accommodate CGJ in this way. I think it makes more sense to look at solving this at the shaping engine level, where it could be done in a way that would enable correct behaviour for most existing fonts while probably breaking very few of them. Since CGJ, as described in Unicode, is not supposed have a visual impact on glyph strings after normalisation operations at the character level, it seems to me that shaping engines should probably suppress CGJ from glyph strings. It will have performed its standard text functions before glyph processing operations begin, so should be excluded from those operations. Initially, I was going to propose that it be excluded from GPOS operations to avoid the kind of mark positioning disruption that Denis illustrates, but I can’t think of a good reason why it shouldn’t also be excluded from GSUB. There probably are some fonts in the wild that use CGJ as a hack in GSUB. This is a character whose original intent was rapidly abandoned by Unicode, and then repurposed in the standard, and there is no implementation recommendation for it in OpenType documentation. So I am pretty sure someone somewhere will have looked at it and thought, ‘Oh, I can use this to join graphemes’ or simply to force some non-standard behaviour in a particular font. Such hacks will become non-operational if shaping engines are changed to fix the outcomes for most fonts. One of the few places I am aware of where CGJ is actively used for the purposes specified in Unicode is Biblical Hebrew, where CGJ is used to prevent reordering of marks occuring due to broken-but-unfixable canonical combining class assignments. So fonts for Biblical Hebrew do make accommodation for filtering CGJ. These should not break if shaping engines suppress the CGJ glyph. |
By "with Windows fonts", I meant Windows fonts that have glyphs for CGJ and combining marks, as fallback fonts are used when CGJ is not present. This should be a font shaper issue for the reasons @tiroj mentionned and as other font shapers tested don’t hit this problem. |
I was pondering that idea too. If CGJ is suppressed at the glyph level, that should happen immediately after the cmap operation establishs the initial glyph string but before GSUB begins. If CGJ is processed wholly at the character level, then it needs to be taken into account during the cmap operation, where it may prevent the NFC-like normalisation often applied at that stage, but doesn’t actually need to be present in the font as a glyph with a cmap entry for this to work. |
The Microsoft Windows fonts that have both U+034F COMBINING GRAPHEME JOINER and some combining marks character fail to display them properly when used together.
The Unicode Standard, 16.0, section 23.2.4 defines its use ; particularly:
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-23/#G24326
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-23/#G24492
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-23/#G24500
See also:
Compare ü (U+00FC) and u͏̈ (U+0075, U+034F, U+0308) in Windows fonts that have glyphs for both:
There probably shouldn’t be any visual differences between ü (U+00FC) and u͏̈ (U+0075, U+034F, U+0308) in those fonts as they have all the glyphs necessary present.
The text was updated successfully, but these errors were encountered: