Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to encode mathematical fractions? #24

Open
kba opened this issue Jun 23, 2021 · 2 comments
Open

How to encode mathematical fractions? #24

kba opened this issue Jun 23, 2021 · 2 comments

Comments

@kba
Copy link
Member

kba commented Jun 23, 2021

While Unicode does have codepoints for the most common fractions (¼, ½, ¾ etc). this does not scale because of course not all possible numerator/denominator combinations are available. So it might be best to encode fractions as just "numerator fraction-slash denominator" (with regular numbers or super/subscript numbers?) or even produce LaTeX syntax.

@bertsky
Copy link

bertsky commented Jun 23, 2021

with regular numbers or super/subscript numbers?

no, regular numbers are what Unicode suggests for this. The typical small-script font appearance is implemented by Unicode renderers merely because of the pattern numeral fraction-slash numeral, i.e. both the numerator and denominator are ordinary (ASCII) numerals. (You can try it out with an editor/browser of your choice, e.g. ¾⅔ (precomposed) vs 3⁄4 2⁄3 (independent but rendered equally by good fonts/engines – GH obviously is not one of them).

or even produce LaTeX syntax

I'd recommend against that. LSTM-CTC will learn to give you character sequences, but getting a certain syntax consistently is pure luck.

Note: the actual argument for differentiating fraction slash against ordinary slash goes as follows: on the visual side, a fraction will always be discernable from other numeric expressions involving slash (like dates or identifiers/codes), because it looks super/subscripted, so the OCR can learn that. That's even independent of the decision whether super/subscript numbers should be represented as such (or ordinary numbers).

@tboenig
Copy link
Contributor

tboenig commented Jun 29, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants