Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[glyphdata] handle production names for ligatures with script suffixes #771

Merged
merged 8 commits into from
Aug 20, 2023

Conversation

khaledhosny
Copy link
Collaborator

We had them handled in category detection but not in production names.
This change generalizes this and adds few more conditions:

  1. Don’t add script suffix to ligature part that already have one.
  2. Don’t add it if the part had an entry in GlyphData (to handle cases like "brevecomb_Dboldscript-math").

Should fix #770

@khaledhosny
Copy link
Collaborator Author

@madig feel free to add more tests. This probably does not fix all the differences you are seeing.

@khaledhosny
Copy link
Collaborator Author

The regression tests failure is because we now detect proper names for more ligatures.

@madig
Copy link
Collaborator

madig commented Feb 1, 2022

That was fast. Thanks! A ripgrep and sort through the diffs shows all new names are uni-names now, so that's good.

I added more testcases and found several diffs though. Not entirely sure if some of it is due to outdated info in GlyphData.xml? I used Glyphs 2.6.6 to generated the expected values.

@khaledhosny
Copy link
Collaborator Author

With ef6fed2 the failures are down to 16 (was 70).

@khaledhosny
Copy link
Collaborator Author

The last commit should have fixed a punch of failures, but it does not. We now generate uni names for the affected glyphs, but they are different from Glyphs’.

@khaledhosny
Copy link
Collaborator Author

Remaining failures:

FAILED tests/glyphdata_test.py::test_prod_names[sh_r-deva-uni0936094D0930094D] - AssertionError: assert 'uni0936094D094D0930' == 'uni0936094D0930094D'
FAILED tests/glyphdata_test.py::test_prod_names[idotaccent.sc-i.sc.loclTRK] - AssertionError: assert 'i.loclTRK.sc' == 'i.sc.loclTRK'
FAILED tests/glyphdata_test.py::test_prod_names[t_r-deva-uni0924094D0930094D] - AssertionError: assert 'uni0924094D094D0930' == 'uni0924094D0930094D'

I think we should just accept the idotaccent.sc one as insignificant difference, but I have no idea what is causing the difference in the other two; we are splitting the ligatures correctly and getting the correct part names from GlyphData and I have no idea from where Glyphs is pulling the difference (AFAICT the relavant entries are the same in our copy of the data and Glyphs’).

@khaledhosny khaledhosny force-pushed the issue-770 branch 2 times, most recently from df01a91 to 49fb300 Compare February 1, 2022 14:22
@anthrotype
Copy link
Member

what's the status of this PR?

@schriftgestalt
Copy link
Collaborator

I’m reworking the glyph data handling quite a bit right now so that is probably not needed any more.

@anthrotype
Copy link
Member

I’m reworking the glyph data handling quite a bit right now so that is probably not needed any more.

how's that going?

@khaledhosny
Copy link
Collaborator Author

I updated the expectations of the three remaining test failures. These should be investigated, but the failures are unrelated to splitting ligatures, we are doing this correctly but Glyphs is doing something more here.

If there are no objections, I’m going to merge this.

@khaledhosny khaledhosny requested a review from anthrotype July 23, 2023 16:19
@khaledhosny
Copy link
Collaborator Author

(As said previously, the regression failures are this PR working and handling more ligatures than before).

@anthrotype
Copy link
Member

maybe instead of making the test pass you could mark them as xfail?

"sa_iiMatra-tamil": "uni0BB80BC0",
"sa_uMatra-tamil": "uni0BB80BC1",
"sa_uuMatra-tamil": "uni0BB80BC2",
"sh_r-deva": "uni0936094D094D0930", # uni0936094D0930094D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not familiar with devanagari conjucts but looks like there's some reordering that is (or is not) happening here?
/cc @schriftgestalt

- uni0936094D094D0930
+ uni0936094D0930094D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is addressed by the mini Devanagri shaper in #818, so I’m not going to worry about it here.

@anthrotype
Copy link
Member

+1 to merge this as improves current handling of ligatures' production names; as for any outstanding issues left, we could track them in their own issue and tackle later

@schriftgestalt
Copy link
Collaborator

That should be handle with my pull request that was never excepted.

@khaledhosny
Copy link
Collaborator Author

That should be handle with my pull request that was never excepted.

Which one, #818?

If you want to finish it in some reasonable time frame, I’m more than happy to close this one instead, otherwise I’m currently blocked on this issue and need a fix ASAP.

@anthrotype
Copy link
Member

That should be handle with my pull request that was never excepted.

hey Georg, the reason the other PR wasn't accepted is because it doesn't seem to be in a mergeable state currently; it's got a "WIP" in the title, there's no high level description of what it intends to do, it's got a long/noisy commit log which ideally could be squashed, a bunch of commented out code, leftover debug() statements, something called _applySimpleIndicShaping which is anything but "simple"...
Khaled's PR is smaller and more targeted, so I'd prefer we merge this one first.

@schriftgestalt
Copy link
Collaborator

It was mergable when opened the pull request.

@schriftgestalt
Copy link
Collaborator

My PR fixes and improves several areas (names, indo, writing direction …). Adding more changes now will make it even more complicated to merge it.
And that code is needed also for round tripping Glyphs 2 and 3 files.

@khaledhosny
Copy link
Collaborator Author

So what is the resolution here? #818 is unmergable as it stands and I’m still blocked.

@schriftgestalt
Copy link
Collaborator

I'll panned to work on glyphsLib next week. Finishing the Glyphs3 branch and rebasing the info branch.

@khaledhosny
Copy link
Collaborator Author

I hate to keep saying this, but this PR have been open for over a tear and half and it is been 3 weeks since I rebased it, and I’m still blocked. I suggest merging this and I’ll try to help with rebasing #818 and solving the conflicts afterwords.

khaledhosny and others added 3 commits August 20, 2023 21:58
We had them handled in category detection but not in production names.
This change generalizes this and adds few more conditions:

1. Don’t add script suffix to ligature part that already have one.
2. Don’t add it if the part had an entry in GlyphData (to handle cases
   like "brevecomb_Dboldscript-math").

Should fix #770
Since “d” has an entry in GlyphData, we were not adding the script
suffix to it. Now we check if both “d” exists and “d-deva” does’t before
deciding not to add the suffix.
This way alternative names are always considered. This was supposed to
fix names like “sh_r-deva”, but though we now produce uni names, they
are still different from Glyphs’.
In glyph names like “po-khmer.below.ro” we were discarding all suffixes
while searching for base glyh name, but (surprisingly) “po-khmer.below”
has an entry in GlyphData and should be the base glyph name instead of
“po-khmer” (yay for consistent naming scheme). Fix this by looking for
name with suffixes one after another first.
Handle names like “moMa_underscore-thai” where it should be split into
“moMa-thai” and plain “underscore” not “underscore-thai”. I updated the
test expectation because Glyphs is just bing silly building production
names inconsistently.
These should be fixed, but I think we are missing some Glyphs magic here
unrelated to the splitting of the ligature parts.
@anthrotype
Copy link
Member

Feel free to merge this as soon as you make the CI pass, thanks

@khaledhosny khaledhosny force-pushed the issue-770 branch 2 times, most recently from d5e00f6 to e9789db Compare August 20, 2023 20:03
Something seems to be wrong with it. Windows CI job is failing with:

py: exit 1 (6.58 seconds) D:\a\glyphsLib\glyphsLib> python -I -m pip install -r requirements.txt -r requirements-dev.txt pid=3368
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
  py: FAIL code 1 (11.77 seconds)

  evaluation failed :( (12.05 seconds)
    lxml==4.9.1 from https://files.pythonhosted.org/packages/5b/2a/b29ca0616397e6d5608255cd0f635a6786892fec898eb65fe8aa4347e9c0/lxml-4.9.1-cp37-cp37m-win_amd64.whl (from -r requirements-dev.txt (line 49)):

        Expected sha256 eea5d6443b093e1545ad0210e6cf27f920482bfcf5c77cdc8596aec73523bb7e

             Got        406f265a44905c7de1e6f3d598d451a56652abf587acab28a4906896da56d434
@khaledhosny
Copy link
Collaborator Author

BTW, I tried to rebase #818 on current master, and there are significant conflicts, so I don’t think merging this is going to make it much worse.

@khaledhosny khaledhosny merged commit 273496b into main Aug 20, 2023
@khaledhosny khaledhosny deleted the issue-770 branch August 20, 2023 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some glyph names get a different PostScript name in Glyphs.app
4 participants