Command line tool layout analysis and OCR of historical prints using Kraken.
- Python:
3.9.x - 3.12.x
- CUDA:
12.x
git clone --single-branch --branch octopy https://github.com/jahtz/kraken
pip install kraken/.
git clone https://github.com/jahtz/octopy
pip install octopy/.
export LD_LIBRARY_PATH="/usr/local/<cuda_version>/lib64:$LD_LIBRARY_PATH"
octopy --help
> octopy segtrain --help
Usage: octopy segtrain [OPTIONS]
Train a custom segmentation model using Kraken.
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ * --gt -g DIRECTORY Directory containing ground truth XML and matching image │
│ files. Multiple directories can be specified. │
│ [required] │
│ --gt-glob TEXT Glob pattern for matching ground truth XML files within the │
│ specified directories. │
│ [default: *.xml] │
│ --eval -e DIRECTORY Optional directory containing evaluation data with matching │
│ image files. Multiple directories can be specified. │
│ --eval-glob TEXT Glob pattern for matching XML files in the evaluation │
│ directory. │
│ [default: *.xml] │
│ --partition -p FLOAT Split ground truth files into training and evaluation sets if │
│ no evaluation files are provided. Default partition is 90% │
│ training, 10% evaluation. │
│ [default: 0.9] │
│ --model -m FILE Path to a pre-trained model to fine-tune. If not set, │
│ training starts from scratch. │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ * --output -o DIRECTORY Output directory for saving the model and │
│ checkpoints. │
│ [required] │
│ --name -n TEXT Name of the output model. Used for saving │
│ results and checkpoints. │
│ [default: foo] │
│ --device -d TEXT Specify the device for processing (e.g. cpu, │
│ cuda:0, ...). Refer to PyTorch documentation │
│ for supported devices. │
│ [default: cpu] │
│ --workers -w INTEGER RANGE Number of worker processes for CPU-based │
│ training. │
│ [default: 1; x>=1] │
│ --threads -t INTEGER RANGE Number of threads for CPU-based training. │
│ [default: 1; x>=1] │
│ --resize -r [union|new|fail] Controls how the model's output layer is │
│ resized if the training data contains │
│ different classes. `union` adds new classes │
│ (former `add`), `new` resizes to match the │
│ training data (former `both`), and `fail` │
│ aborts training if there is a mismatch. │
│ [default: new] │
│ --suppress-regions Disable region segmentation training. │
│ --suppress-baselines Disable baseline segmentation training. │
│ --valid-regions -vr TEXT Comma-separated list of valid regions to │
│ include in the training. This option is │
│ applied before region merging. │
│ --valid-baselines -vb TEXT Comma-separated list of valid baselines to │
│ include in the training. This option is │
│ applied before baseline merging. │
│ --merge-regions -mr TEXT Region merge mapping. One or more mappings │
│ of the form `src:target`, where `src` is │
│ merged into `target`. `src` can be │
│ comma-separated. │
│ --merge-baselines -mb TEXT Baseline merge mapping. One or more mappings │
│ of the form `src:target`, where `src` is │
│ merged into `target`. `src` can be │
│ comma-separated. │
│ --verbose -v INTEGER RANGE Set verbosity level for logging. Use -vv for │
│ maximum verbosity (levels 0-2). │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Hyperparameters ────────────────────────────────────────────────────────────────────────────╮
│ --line-width INTEGER Height of baselines in the target │
│ image after scaling. │
│ [default: 8] │
│ --line-width INTEGER Height of baselines in the target │
│ image after scaling. │
│ [default: 8] │
│ --padding <INTEGER INTEGER>... Padding (left/right, top/bottom) │
│ around the page image. │
│ [default: 0, 0] │
│ --freq FLOAT Model saving and report generation │
│ frequency in epochs during │
│ training. If frequency is >1 it │
│ must be an integer, i.e. running │
│ validation every n-th epoch. │
│ [default: 1.0] │
│ --quit [early|fixed] Stop condition for training. Choose │
│ `early` for early stopping or │
│ `fixed` for a fixed number of │
│ epochs. │
│ [default: fixed] │
│ --epochs INTEGER Number of epochs to train for when │
│ using fixed stopping. │
│ [default: 50] │
│ --min-epochs INTEGER Minimum number of epochs to train │
│ for before early stopping is │
│ allowed. │
│ [default: 0] │
│ --lag INTEGER RANGE Early stopping patience (number of │
│ validation steps without │
│ improvement). Measured by │
│ val_mean_iu. │
│ [default: 10; x>=1] │
│ --optimizer [Adam|SGD|RMSprop|Lamb] Optimizer to use during training. │
│ [default: Adam] │
│ --lrate FLOAT Learning rate for the optimizer. │
│ [default: 0.0002] │
│ --momentum FLOAT Momentum parameter for applicable │
│ optimizers. │
│ [default: 0.9] │
│ --weight-decay FLOAT Weight decay parameter for the │
│ optimizer. │
│ [default: 1e-05] │
│ --schedule [constant|1cycle|exponential|cosi Set learning rate scheduler. For │
│ ne|step|reduceonplateau] 1cycle, cycle length is determined │
│ by the `--step-size` option. │
│ [default: constant] │
│ --completed-epochs INTEGER Number of epochs already completed. │
│ Used for resuming training. │
│ [default: 0] │
│ --augment Use data augmentation during │
│ training. │
│ --step-size INTEGER Step size for learning rate │
│ scheduler. │
│ [default: 10] │
│ --gamma FLOAT Gamma for learning rate scheduler. │
│ [default: 0.1] │
│ --rop-factor FLOAT Factor for reducing learning rate │
│ on plateau. │
│ [default: 0.1] │
│ --rop-patience INTEGER Patience for reducing learning rate │
│ on plateau. │
│ [default: 5] │
│ --cos-t-max INTEGER Maximum number of epochs for cosine │
│ annealing. │
│ [default: 50] │
│ --cos-min-lr FLOAT Minimum learning rate for cosine │
│ annealing. │
│ [default: 2e-05] │
│ --warmup INTEGER Number of warmup epochs for cosine │
│ annealing. │
│ [default: 0] │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
> octopy segment --help
Usage: octopy segment [OPTIONS] IMAGES...
Segment images using Kraken.
IMAGES: Specify one or more image files to segment. Supports multiple file paths, wildcards,
or directories (with the -g option).
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ * IMAGES PATH [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --glob -g TEXT Glob pattern for matching images in directories. (used with │
│ directories in IMAGES). │
│ [default: *.ocropus.bin.png] │
│ --model -m FILE Path to custom segmentation model(s). If not provided, the default │
│ Kraken model is used. │
│ --output -o DIRECTORY Output directory for processed files. Defaults to the parent │
│ directory of each input file. │
│ --suffix -s TEXT Suffix for output PageXML files. Should end with '.xml'. │
│ [default: .xml] │
│ --device -d TEXT Specify the processing device (e.g. 'cpu', 'cuda:0',...). Refer to │
│ PyTorch documentation for supported devices. │
│ [default: cpu] │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Fine-Tuning ────────────────────────────────────────────────────────────────────────────────╮
│ --creator TEXT Metadata: Creator of the PageXML files. │
│ [default: octopy] │
│ --direction [hlr|hrl|vlr|vrl] Text direction of input images. [default: hlr] │
│ --suppress-lines Suppress lines in the output PageXML. │
│ --suppress-regions Suppress regions in the output PageXML. Creates a │
│ single dummy region for the whole image. │
│ --fallback INTEGER Use a default bounding box when the polygonizer │
│ fails to create a polygon around a │
│ baseline.Requires a box height in pixels. │
│ --heatmap TEXT Generate a heatmap image alongside the PageXML │
│ output. Specify the file extension for the heatmap │
│ (e.g., `.hm.png`). │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
> octopy shrink --help
Usage: octopy shrink [OPTIONS] PAGEXML...
Shrink region polygons of PageXML files.
PAGEXML: Specify one or more PageXML files to shrink. Supports multiple file paths, wildcards,
or directories (with the -g option).
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ * PAGEXML PATH [required] │
│ --glob -g TEXT Glob pattern for matching PageXML files in directories. (used │
│ with directories in PAGEXML). │
│ [default: *.xml] │
│ --input-suffix -i TEXT Suffix for image selection. Should match full suffix of input │
│ PageXML files. │
│ [default: .bin.png] │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --output -o DIRECTORY Output directory for processed files. Defaults to the │
│ parent directory of each input file. │
│ --output-suffix -s TEXT Suffix for shrunken PageXML files. Should end with '.xml'. │
│ Could overwrite input files. │
│ [default: .xml] │
│ --padding -p INTEGER Padding around the shrunken regions in pixels. [default: 5] │
│ --horizontal -h INTEGER The higher, the more horizontal smoothing is applied. │
│ [default: 3] │
│ --vertical -v INTEGER The higher, the more vertical smoothing is applied. │
│ [default: 3] │
│ --valid-region -vr TEXT Valid regions for shrinking. If nothing is provided, all │
│ regions are shrunk. Multiple selections are possible. │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯