-
Notifications
You must be signed in to change notification settings - Fork 7
Tools for Visualizing (intermediate) OCR D results
The Page Viewer is a stand alone application for viewing page layout and text content of segmentation ground truth and results of page recognition/OCR systems. The natively supported file format is PAGE XML. However, ALTO XML, FineReader XML, and HOCR can be opened as well.
The viewer shows the page layout as a transparent overlay on the document image. Text content and object attributes are displayed as tooltips.
The Page Viewer requires a Java Runtime Environment version 6 or later. Both 32 and 64 bit installations are supported. Supported platforms are: Windows, Linux, and MacOS. (https://www.primaresearch.org/tools/PAGEViewer)
- download a pre-built release from Github
- unzip somewhere
- copy/symlink the startup script from your platform's subdirectory to your search
PATH
, probably adding--resolve-dir $PWD
(or similar) to the arguments (in order to make PageViewer resolve relative image paths w.r.t. the current working directory instead of the XML file – which is more useful for OCR-D workspaces).
For example, on Linux, add this to your~/.bash_aliases
or~/.bashrc
:
alias jpageviewer='java -jar ~/path/to/JPageViewer\ 1.4\ \(Linux\,\ 64\ bit\)/JPageViewer.jar --resolve-dir $PWD'
# cd into workspace directory
jpageviewer OCR-D-SEG-TESS/PAGE1.xml
(Then continue with the Open button, navigating to the next PAGE file, or close the UI and start new instance on the shell.)
- Scheme support: all PAGE versions, but also ALTO
- Shows fully recursive regions, including reading order
- Shows all hierarchy levels from Border to Glyph
- Platforms: Win, Linux, Mac
- Recommended usage: viewing
- Bugs related to zooming (which breaks tooltips)
- Does not show AlternativeImage content
- Does not rotate image according to annotated skew
- Fixed colour scheme
- No METS or directory navigation (pages have to be opened individually)
Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. It aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback from major libraries across Europe and from their digitisation service providers which are using it in a production environment.
Cutting-edge features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth as well as standardised electronic renditions of digitised documents.
In addition, special features such as a customisable virtual keyboard and the Aletheia Sans font with extensive coverage of special characters in Unicode have been developed to support working with the complexities of historical documents. (https://www.primaresearch.org/tools/Aletheia)
Aletheia is available either as a free Lite version (only requires registration via Email) or as a Pro version (annual paid subscription, added features and support).
See also the feature comparsion for both versions.
- unzip somewhere
- run
Aletheia.exe
- Scheme support: all PAGE versions, but also ALTO
- Shows fully recursive regions, including reading order
- Shows all hierarchy levels from Border to Glyph
- Offers lots of check/fixup tools for consistency
- Platforms: Win
- Recommended usage: editing and viewing
- Some directory navigation (pages have to be opened collectively)
- Does not show AlternativeImage content
- Does not rotate image according to annotated skew
- Fixed colour scheme
- No METS navigation
- Does not support recent PAGE versions
- Not free
- native: as described the README
- Docker:
docker pull bertsky/larex
and then as described here, e.g.docker run --rm -u 0:$GROUPS -v path/to/workspace:/data bertsky/larex
- go to
http://localhost:8080/Larex
with your browser (preferably Chrome/chromium)
- Very efficient for large amounts of pages (fast, has keyboard shortcuts for everything), esp. for text correction
- Offers custom auto-segmentation, including reading order
- Variable colour scheme
- Platforms: Linux or Docker-capable
- Recommended usage: editing and viewing
- Does not show Border or hierarchy levels below TextLine
- Does not show recursive regions
- Does not show AlternativeImage content
- Does not rotate image according to annotated skew
- No direct METS navigation (custom, flat
bookpath
directory structure which needs to be exported from OCR-D fileGrps viaocrd-export-larex
)
nw-page-editor is an application for editing ground truth information for diverse purposes related to the areas of document processing and text recognition. The edition is done interactively and visually on top of images of scanned documents. Additionally the app supports many keyboard shortcuts to allow more efficient editing, see section Application usage shortcuts.
The app is available in two variants. The first variant is as a desktop application based on the NW.js framework thus making it cross-platform. The second variant is as a web application that allows remote editing by multiple users and can be easily setup via a docker container. (https://github.com/mauvilsa/nw-page-editor)
- Scheme support: PAGE XML Version [http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15] and property extensions (https://github.com/omni-us/pageformat)
- Platforms: Win, Linux, Mac
- Recommended usage:
editing andviewing
- Custom PAGE extensions when editing
An extensible viewer for OCRD mets.xml files (https://github.com/hnesk/browse-ocrd)
sudo apt install libcairo2-dev libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev
pip install browse-ocrd
# cd into workspace directory
browse-ocrd mets.xml
- Scheme support: OCR-D METS conventions (https://ocr-d.de/en/spec/mets)
- Shows pages on all fileGrps, including AlternativeImages
- Platforms: Linux
- Recommended usage: viewing
- Only shows page-level (but not region/line/word) AlternativeImage
- Slow on large documents with many/large pages
- No zooming currently
feh is an X11 image viewer aimed mostly at console users. Unlike most other viewers, it does not have a fancy GUI, but simply displays images. It is controlled via commandline arguments and configurable key/mouse actions. (https://feh.finalrewind.org/)
sudo apt install feh
# cd into workspace directory
feh OCR-D-IMG-BIN/
- Exact zoom interpolation
- Extensive keyboard shortcuts
- Allows keeping zoom level across pages
- Very versatily and fast
- Can browse multiple files, including thumbnail mode
- No multi-page TIFF display
sudo apt install evince
# cd into workspace directory
evince OCR-D-IMG-BIN/PAGE1.png
- Has multi-page TIFF display
- Artefacts and/or decreased sharpness in zoom interpolation
- Cannot browse multiple files
Use ImageMagick® to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, HEIC, TIFF, DPX, EXR, WebP, Postscript, PDF, and SVG. ImageMagick can resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.
sudo apt install imagemagick
# cd into workspace directory
identify -verbose OCR-D-IMG/*.tiff
compare OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png
display OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows