You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In their current form, the OCR-D transcription guidelines are often of little use to annotators looking for answers or guidance. They are written top-down intellectual accounts, but not formal (i.e. runnable/verifiable) and not searchable and – well, quite incomplete. Although many examples are given already, this is not nearly enough for the diverse set of materials and pecularities which annotators face (esp. those without a bibliological / humanities background).
finally starting a software implementation (which can normalize arbitrary text input at each GT level or canonicalize to the next lower level)
opening up the repository for comments and amendments by users/practitioners (perhaps in the same way that the workflow guide was mirrored to the wiki and gets synchronized back every now and then)
starting a public glyph repository by aggregating diverse textual GT, enriching it with glyph coordinates via OCR (e.g. Tesseract 3 segmenter) forced alignment, and extracting glyph image-text file pairs
tying the website to the glyph repo with a dedicated search interface: text→image search and image→text search (via image similarity like in Newspaper Navigator)
The text was updated successfully, but these errors were encountered:
opening up the repository for comments and ammendments by users/practitioners (perhaps in the same way that the workflow guide was mirrored to the wiki and gets synchronized back every now and then)
It's not as convenient as a Wiki (with direct preview), and not as conventient as editing Markdown files on Github (with direct preview), but perhaps users can just fork/edit the gt-guidelines repo?
finally starting a software implementation (which can normalize arbitrary text input at each GT level or canonicalize to the next lower level)
In their current form, the OCR-D transcription guidelines are often of little use to annotators looking for answers or guidance. They are written top-down intellectual accounts, but not formal (i.e. runnable/verifiable) and not searchable and – well, quite incomplete. Although many examples are given already, this is not nearly enough for the diverse set of materials and pecularities which annotators face (esp. those without a bibliological / humanities background).
How can we improve that?
I propose attacking this on multiple levels:
The text was updated successfully, but these errors were encountered: