-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restore accented characters #107
Comments
But http://www.gutenberg.org/ebooks/135 seems to have accents. Or am I missing something? |
Indeed! Last Updated: January 18, 2016. They're not synced. |
Well they definitely should be ^^ … or rather gutenberg.org should be a compiled version of gitenberg. |
we need to trawl through the PG metadata to pull out updated files. Will add an issue. |
The early gutenberg texts were produced in 7 bit ascii, so no unicode, no accented characters.
For example, https://github.com/GITenberg/Les-Mis-rables_135/master/book.asciidoc
We need to figure out a good way to re-accent the texts.
one way might be to produce a list of accented words by analyzing another version for example, https://ebooks.adelaide.edu.au/h/hugo/victor/lesmis/. short words like "à" could be combined with neighboring words.
@adius might have some ideas.
The text was updated successfully, but these errors were encountered: