Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longer term idea: add a .csv file for glyph info #163

Open
justvanrossum opened this issue Jan 27, 2021 · 14 comments
Open

Longer term idea: add a .csv file for glyph info #163

justvanrossum opened this issue Jan 27, 2021 · 14 comments

Comments

@justvanrossum
Copy link
Contributor

justvanrossum commented Jan 27, 2021

An idea (that isn't mine) has been on my mind for a while, relating to glyph-level info.

We could add a .csv file, for glyph data such as:

  • unicode(s)
  • unicode variation selectors
  • category
  • ps name
  • glif file name even?
  • custom columns (with a column naming convention)

SIL has been using such a format for a while already, see for example: https://github.com/silnrsi/font-harmattan/blob/master/source/glyph_data.csv

Since such data is often family- or even project-wide, it makes sense to at least allow this file to be outside of the ufo structure, so it can be shared by multiple UFOs.

Is it worth brainstorming about this idea a bit more perhaps?

Relates to:

@madig
Copy link
Contributor

madig commented Jan 27, 2021

I'd be in favor of a central place to store global stuff, doesn't have to be CSV. Ideally this ties into the UFO Font Set idea somehow.

@chrissimpkins
Copy link
Contributor

@justvanrossum
Copy link
Contributor Author

justvanrossum commented Jan 28, 2021

I like CSV (despite all known problems) for this purpose because it's a tabular format, and is directly readable by tons of third party applications. Plist would be more in line with UFO, but that is pretty much its only selling point at this stage. I'd like to hear about alternatives, though.

Ideally this ties into the UFO Font Set idea somehow.

Yes. But: I have personally more or less given up on the idea of creating a grand new design for a UFO Font Set, especially since .designspace already covers much of what is needed.

I would like to think of "UFO Font Set" as an abstract idea first, and let a phyisical manifestation emerge from best practices. What role does .designspace perform exactly and how could that be improved for this context? What do people need in addition to that? Some discussion here: fonttools/fonttools/issues/1507

To me, it is clear that some form of project-global glyph data table is needed, so this CSV (or whatever) glyph data idea could be a low-key step towards a more complete UFS structure.

@jvgaultney
Copy link

jvgaultney commented Feb 1, 2021

We have found managing family-wide glyph data in csv to be practical and helpful. It's much more human-parsable than alternatives. It also saves us a huge amount of bother trying to keep a family of UFOs (and even multiple families) in sync.

The most important issue here is how to keep the external csv and the internal UFO data (like glif/unicode, public.postscriptNames, public.glyphOrder) in sync. Duplicated data is generally a Bad Idea, however if there is a clear sync priority and there are tools in a common toolchain that will apply that sync then it is very manageable. It's all part of normalizing a family.

Although a fancy UFO Font Set is unlikely in the near future, there will still need to be some way to indicate that there is some glyph data out there that relates to this designspace. Could a reference to a csv be placed inside the designspace, as in:

<data>
        <glyphdata filename="glyph_data.csv">
</data>

BTW there are other lovely benefits of an external glyph data source. You can have one or more standard data files that can be used to populate a UFO for a new project, maybe for specific clients or scripts. Glyphs does with with data internal to the app (which can be somewhat customizable by users), but that isn't practical to keep with the project. You can also have tools that compare a UFO with a glyph data file and report missing glyphs, mis-encoded ones, etc.

In the longer-term, I think that much of what we need for a UFS can be done with added elements in designspace, and it would make sense to take small steps in that direction.

@madig
Copy link
Contributor

madig commented Feb 1, 2021

What I'd want out of UFS is that the object model implementation knows to source data from a global place. If you e.g. put all Unicode values into a global CSV, then ufoLib2 and defcon should return them when you access glyph.unicodes. Changes to the property should then be reflected in the CSV upon saving. Otherwise, you have the classic data-in-two-places-how-do-I-sync problem.

@justvanrossum
Copy link
Contributor Author

If you e.g. put all Unicode values into a global CSV, then ufoLib2 and defcon should return them when you access glyph.unicodes

Yes. But perhaps only as a b/w compatibility measure, as it also implies (to me) that the unicode will (eventually) not be stored in the glif file at all.

Changes to the property should then be reflected in the CSV upon saving

Once those attributes come from outside the UFO, they should probably be read-only from the UFO's perspective. I don't like a change to one UFO's glyph's unicode value be implicitly propagated to other UFO's. The glyph data file should get its own scripting interface.

@madig
Copy link
Contributor

madig commented Mar 9, 2022

After thinking about this some, I think it could be stapled onto the existing UFO v3(.1) framework.

  1. An additional CSV file "glyph_data.csv" inside the UFO means that UFO-wide data can be taken from there, i.e. the compiler and authoring tool should treat is as the source of truth (and on save drop contradicting data elsewhere). You can encode public.glyphOrder in the CSV row order, public.openTypeCategories, public.postscriptNames, public.skipExportGlyphs (true/false) and public.unicodeVariationSequences can be made into columns as they are.
  2. If the file is absent but the lib key "public.glyphDataFile" is present, load the file pointing to it and use it as the source of truth -- the file can be shared across multiple UFOs in interpolation or variable font contexts. No Designspace mod necessary.
  3. If neither is present, use existing lib keys and glyph codepoints for data.

This would indeed need some changes to defcon/ufoLib2 and code to be made aware of this new structure. It may make it easier to drop contents.plist potentially, maybe if you also store filenames in the table.

@justvanrossum
Copy link
Contributor Author

I'm not sure we should have the .csv file inside the UFO. The main purpose would be to share glyph data for a whole project, so the common use case should be a single .csv for multiple UFOs. If we introduce this file, we should encourage sharing it, and not default to having one for each UFO.

Since .designspace is evolving towards being a general conductor for an ensemble of UFOs, I would love to see a (the) glyph data .csv referenced there: perhapspublic.glyphDataFile in the designspace lib?

I can see why a backreference from UFO to .csv can practical if you need to view/edit UFOs in isolation, but I'm a bit worried it may be fragile: it's another piece of data which then needs to be kept in sync across multiple UFOs.

(I would love to keep the the contents.plist problem out of a glyphs info .csv, and still think #164 is a viable idea, even for UFO 3.x. The first step for getting rid of contents.plist could be to say: "if it is missing, get the glyph names from the file names if possible, else parse the name field from the XML".)

@jvgaultney
Copy link

I like the idea of having the designspace reference the glyph_data.csv. There is a higher level conceptual difference between data in the UFO and in (or referenced by) the designspace. The UFO is a glyph source. The designspace has evolved to define how those sources are used to create specific fonts. For example, we've recently begun producing Google-style axis-based static font families, plus supplemental RIBBI families with different family names for dumb old apps that can't handle anything but RIBBI - defining it all with multiple designspaces. Just lovely. (And now we're going to experiment with merging the instances into a single designspace.)

I wonder - might it be good to go beyond a simple public.glyphDataFile reference to a .csv and instead use the designspace to define how to interpret it? For example, our common glyph_data.csv for our Lat/Cyr/Grk fonts uses something like this as .csv column headers:

glyph_name,ps_name,sort_final_cdg,sort_final_a,...

The two sort columns are there because one of our font families (Andika) requires a different glyph order than the others. (BTW this is a reason why it may not be good to define public.glyphOrder as the .csv row order.)

Could we say in the designspace lib that

public.postscriptName = column 'ps_name' of filename 'glyph_data.csv'

There are a few different ways this could be expressed, but it would be great if we could give this a defined element in the designspace rather than a lib entry:

<instance>
    <glyphdata>
        <glyphdatakey key="public.postscriptName" filename="glyph_data.csv" column="ps_name" />
        <glyphdatakey key="public.glyphOrder" filename="glyph_data.csv" column="sort_final_a" />
    </glyphdata>
</instance>

This would allow some of the other functional features of designspace elements (overrides, etc).

(I too would love to get rid of contents.plist - it's one of the most troublesome aspects of UFO. I can't count how many times people on our team have added/removed glyphs and forgot to commit the corresponding lines in contents.plist.)

@justvanrossum
Copy link
Contributor Author

this is a reason why it may not be good to define public.glyphOrder as the .csv row order

Agreed that glyph order should have a dedicated column, and not be implied by the row order.

With regard to column names: I think we should just come to a consensus as to how things must be named, and not introduce another indirection there.

I would also love an official way to add custom columns. Keeping with UFO-style, reverse domain names seems a logical choice, but I tend to find them too wordy in most cases. I'd prefer something like: core names are like Python identifiers, custom names must contain at least one ".". Or custom names must be prefixed with custom. or something like that.

Allowing custom columns can also solve your dual-glyph-order problem, without column definitions in the designspace file.

@jvgaultney
Copy link

That all sounds good. This would be easy and clear:

glyphName,public.glyphOrder,public.postscriptName,public.usv,custom.orderA,...

@justvanrossum
Copy link
Contributor Author

glyphName,public.glyphOrder,public.postscriptName,public.usv,custom.orderA,...

I'd leave out the "public." prefix, it just adds noise.

@madig
Copy link
Contributor

madig commented Mar 14, 2022

Using the lib key names without the "public." prefix sounds good, and any custom thing needs at least one "." in it.

I can see why a backreference from UFO to .csv can practical if you need to view/edit UFOs in isolation, but I'm a bit worried it may be fragile: it's another piece of data which then needs to be kept in sync across multiple UFOs.

Hm, yes. I do wonder however how to best integrate it into existing tools? What's the appetite in the Robo* community for adding a new layer to UFOs and DSes that need to be dealt with explicitly? If UFOs continue to be closed packages, how would e.g. Robofont deal with the actual data that matters being stored outside? You already need an extension to deal with Designspaces, no?

Also, as a side-note, CSVs can be edited in Excel, etc., but I don't trust spreadsheet software enough to recommend this to users... so having first-class support in Robofont and elsewhere would be best.

There are a few different ways this could be expressed, but it would be great if we could give this a defined element in the designspace rather than a lib entry:

Sounds reasonable. Designspace already has a (deprecated?) element to pick data from source UFOs, so a better defined new mechanism wouldn't be far-fetched.

@justvanrossum
Copy link
Contributor Author

UFO as closed package: good point, and perhaps your suggestion to just back-reference the .csv from the font lib is a good-enough solution to that. And perhaps optionally storing the .csv inside the UFO (as you also suggested) is an acceptable solution for single-UFO projects.

RoboFont is fundamentally a single-source editor, and I don't think it should be a gate keeper for expanding the designspace/ufo formats to better deal with families. Although some compromises (like the above) may be reasonable.

(The poor guess-the-field-type behavior of Google Sheets, Excel, and Numbers have already led me to always use the U+ prefix when dealing with unicode hex values, so whenever we're going to write up a specification of this glyphs-info-in-csv thing, that's what I'm going to suggest :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants