Skip to content
/ octopy Public

Command line tool for Kraken text segmentation and recognition.

License

Notifications You must be signed in to change notification settings

jahtz/octopy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Octopy

Command line tool layout analysis and OCR of historical prints using Kraken.

Installation

Requirements

  • Python: 3.9.x - 3.12.x
  • CUDA: 12.x

1. Install modified Kraken version

Kraken Fork

git clone --single-branch --branch octopy https://github.com/jahtz/kraken
pip install kraken/.

2. Install Octopy

git clone https://github.com/jahtz/octopy
pip install octopy/.

3. Setup CUDA (optionally, for GPU computation)

export LD_LIBRARY_PATH="/usr/local/<cuda_version>/lib64:$LD_LIBRARY_PATH"

Usage

General

octopy --help

Layout Segmentation Training

 > octopy segtrain --help
                                                                                                
 Usage: octopy segtrain [OPTIONS]                                                               
                                                                                                
 Train a custom segmentation model using Kraken.                                                
                                                                                                
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ *  --gt         -g  DIRECTORY  Directory containing ground truth XML and matching image      │
│                                files. Multiple directories can be specified.                 │
│                                [required]                                                    │
│    --gt-glob        TEXT       Glob pattern for matching ground truth XML files within the   │
│                                specified directories.                                        │
│                                [default: *.xml]                                              │
│    --eval       -e  DIRECTORY  Optional directory containing evaluation data with matching   │
│                                image files. Multiple directories can be specified.           │
│    --eval-glob      TEXT       Glob pattern for matching XML files in the evaluation         │
│                                directory.                                                    │
│                                [default: *.xml]                                              │
│    --partition  -p  FLOAT      Split ground truth files into training and evaluation sets if │
│                                no evaluation files are provided. Default partition is 90%    │
│                                training, 10% evaluation.                                     │
│                                [default: 0.9]                                                │
│    --model      -m  FILE       Path to a pre-trained model to fine-tune. If not set,         │
│                                training starts from scratch.                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ *  --output              -o   DIRECTORY         Output directory for saving the model and    │
│                                                 checkpoints.                                 │
│                                                 [required]                                   │
│    --name                -n   TEXT              Name of the output model. Used for saving    │
│                                                 results and checkpoints.                     │
│                                                 [default: foo]                               │
│    --device              -d   TEXT              Specify the device for processing (e.g. cpu, │
│                                                 cuda:0, ...). Refer to PyTorch documentation │
│                                                 for supported devices.                       │
│                                                 [default: cpu]                               │
│    --workers             -w   INTEGER RANGE     Number of worker processes for CPU-based     │
│                                                 training.                                    │
│                                                 [default: 1; x>=1]                           │
│    --threads             -t   INTEGER RANGE     Number of threads for CPU-based training.    │
│                                                 [default: 1; x>=1]                           │
│    --resize              -r   [union|new|fail]  Controls how the model's output layer is     │
│                                                 resized if the training data contains        │
│                                                 different classes. `union` adds new classes  │
│                                                 (former `add`), `new` resizes to match the   │
│                                                 training data (former `both`), and `fail`    │
│                                                 aborts training if there is a mismatch.      │
│                                                 [default: new]                               │
│    --suppress-regions                           Disable region segmentation training.        │
│    --suppress-baselines                         Disable baseline segmentation training.      │
│    --valid-regions       -vr  TEXT              Comma-separated list of valid regions to     │
│                                                 include in the training. This option is      │
│                                                 applied before region merging.               │
│    --valid-baselines     -vb  TEXT              Comma-separated list of valid baselines to   │
│                                                 include in the training. This option is      │
│                                                 applied before baseline merging.             │
│    --merge-regions       -mr  TEXT              Region merge mapping. One or more mappings   │
│                                                 of the form `src:target`, where `src` is     │
│                                                 merged into `target`. `src` can be           │
│                                                 comma-separated.                             │
│    --merge-baselines     -mb  TEXT              Baseline merge mapping. One or more mappings │
│                                                 of the form `src:target`, where `src` is     │
│                                                 merged into `target`. `src` can be           │
│                                                 comma-separated.                             │
│    --verbose             -v   INTEGER RANGE     Set verbosity level for logging. Use -vv for │
│                                                 maximum verbosity (levels 0-2).              │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Hyperparameters ────────────────────────────────────────────────────────────────────────────╮
│ --line-width          INTEGER                            Height of baselines in the target   │
│                                                          image after scaling.                │
│                                                          [default: 8]                        │
│ --line-width          INTEGER                            Height of baselines in the target   │
│                                                          image after scaling.                │
│                                                          [default: 8]                        │
│ --padding             <INTEGER INTEGER>...               Padding (left/right, top/bottom)    │
│                                                          around the page image.              │
│                                                          [default: 0, 0]                     │
│ --freq                FLOAT                              Model saving and report generation  │
│                                                          frequency in epochs during          │
│                                                          training. If frequency is >1 it     │
│                                                          must be an integer, i.e. running    │
│                                                          validation every n-th epoch.        │
│                                                          [default: 1.0]                      │
│ --quit                [early|fixed]                      Stop condition for training. Choose │
│                                                          `early` for early stopping or       │
│                                                          `fixed` for a fixed number of       │
│                                                          epochs.                             │
│                                                          [default: fixed]                    │
│ --epochs              INTEGER                            Number of epochs to train for when  │
│                                                          using fixed stopping.               │
│                                                          [default: 50]                       │
│ --min-epochs          INTEGER                            Minimum number of epochs to train   │
│                                                          for before early stopping is        │
│                                                          allowed.                            │
│                                                          [default: 0]                        │
│ --lag                 INTEGER RANGE                      Early stopping patience (number of  │
│                                                          validation steps without            │
│                                                          improvement). Measured by           │
│                                                          val_mean_iu.                        │
│                                                          [default: 10; x>=1]                 │
│ --optimizer           [Adam|SGD|RMSprop|Lamb]            Optimizer to use during training.   │
│                                                          [default: Adam]                     │
│ --lrate               FLOAT                              Learning rate for the optimizer.    │
│                                                          [default: 0.0002]                   │
│ --momentum            FLOAT                              Momentum parameter for applicable   │
│                                                          optimizers.                         │
│                                                          [default: 0.9]                      │
│ --weight-decay        FLOAT                              Weight decay parameter for the      │
│                                                          optimizer.                          │
│                                                          [default: 1e-05]                    │
│ --schedule            [constant|1cycle|exponential|cosi  Set learning rate scheduler. For    │
│                       ne|step|reduceonplateau]           1cycle, cycle length is determined  │
│                                                          by the `--step-size` option.        │
│                                                          [default: constant]                 │
│ --completed-epochs    INTEGER                            Number of epochs already completed. │
│                                                          Used for resuming training.         │
│                                                          [default: 0]                        │
│ --augment                                                Use data augmentation during        │
│                                                          training.                           │
│ --step-size           INTEGER                            Step size for learning rate         │
│                                                          scheduler.                          │
│                                                          [default: 10]                       │
│ --gamma               FLOAT                              Gamma for learning rate scheduler.  │
│                                                          [default: 0.1]                      │
│ --rop-factor          FLOAT                              Factor for reducing learning rate   │
│                                                          on plateau.                         │
│                                                          [default: 0.1]                      │
│ --rop-patience        INTEGER                            Patience for reducing learning rate │
│                                                          on plateau.                         │
│                                                          [default: 5]                        │
│ --cos-t-max           INTEGER                            Maximum number of epochs for cosine │
│                                                          annealing.                          │
│                                                          [default: 50]                       │
│ --cos-min-lr          FLOAT                              Minimum learning rate for cosine    │
│                                                          annealing.                          │
│                                                          [default: 2e-05]                    │
│ --warmup              INTEGER                            Number of warmup epochs for cosine  │
│                                                          annealing.                          │
│                                                          [default: 0]                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

Layout Segmentation Prediction

 > octopy segment --help
                                                                                                
 Usage: octopy segment [OPTIONS] IMAGES...                                                      
                                                                                                
 Segment images using Kraken.                                                                   
 IMAGES: Specify one or more image files to segment. Supports multiple file paths, wildcards,   
 or directories (with the -g option).                                                           
                                                                                                
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ *  IMAGES    PATH  [required]                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --glob    -g  TEXT       Glob pattern for matching images in directories. (used with         │
│                          directories in IMAGES).                                             │
│                          [default: *.ocropus.bin.png]                                        │
│ --model   -m  FILE       Path to custom segmentation model(s). If not provided, the default  │
│                          Kraken model is used.                                               │
│ --output  -o  DIRECTORY  Output directory for processed files. Defaults to the parent        │
│                          directory of each input file.                                       │
│ --suffix  -s  TEXT       Suffix for output PageXML files. Should end with '.xml'.            │
│                          [default: .xml]                                                     │
│ --device  -d  TEXT       Specify the processing device (e.g. 'cpu', 'cuda:0',...). Refer to  │
│                          PyTorch documentation for supported devices.                        │
│                          [default: cpu]                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Fine-Tuning ────────────────────────────────────────────────────────────────────────────────╮
│ --creator             TEXT               Metadata: Creator of the PageXML files.             │
│                                          [default: octopy]                                   │
│ --direction           [hlr|hrl|vlr|vrl]  Text direction of input images. [default: hlr]      │
│ --suppress-lines                         Suppress lines in the output PageXML.               │
│ --suppress-regions                       Suppress regions in the output PageXML. Creates a   │
│                                          single dummy region for the whole image.            │
│ --fallback            INTEGER            Use a default bounding box when the polygonizer     │
│                                          fails to create a polygon around a                  │
│                                          baseline.Requires a box height in pixels.           │
│ --heatmap             TEXT               Generate a heatmap image alongside the PageXML      │
│                                          output. Specify the file extension for the heatmap  │
│                                          (e.g., `.hm.png`).                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

Region Polygon Shrinking

 > octopy shrink --help
                                                                                                
 Usage: octopy shrink [OPTIONS] PAGEXML...                                                      
                                                                                                
 Shrink region polygons of PageXML files.                                                       
 PAGEXML: Specify one or more PageXML files to shrink. Supports multiple file paths, wildcards, 
 or directories (with the -g option).                                                           
                                                                                                
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ *  PAGEXML             PATH  [required]                                                      │
│    --glob          -g  TEXT  Glob pattern for matching PageXML files in directories. (used   │
│                              with directories in PAGEXML).                                   │
│                              [default: *.xml]                                                │
│    --input-suffix  -i  TEXT  Suffix for image selection. Should match full suffix of input   │
│                              PageXML files.                                                  │
│                              [default: .bin.png]                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --output         -o   DIRECTORY  Output directory for processed files. Defaults to the       │
│                                  parent directory of each input file.                        │
│ --output-suffix  -s   TEXT       Suffix for shrunken PageXML files. Should end with '.xml'.  │
│                                  Could overwrite input files.                                │
│                                  [default: .xml]                                             │
│ --padding        -p   INTEGER    Padding around the shrunken regions in pixels. [default: 5] │
│ --horizontal     -h   INTEGER    The higher, the more horizontal smoothing is applied.       │
│                                  [default: 3]                                                │
│ --vertical       -v   INTEGER    The higher, the more vertical smoothing is applied.         │
│                                  [default: 3]                                                │
│ --valid-region   -vr  TEXT       Valid regions for shrinking. If nothing is provided, all    │
│                                  regions are shrunk. Multiple selections are possible.       │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

About

Command line tool for Kraken text segmentation and recognition.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages