Skip to content

Latest commit

 

History

History
255 lines (240 loc) · 24 KB

README.md

File metadata and controls

255 lines (240 loc) · 24 KB

Octopy

Command line tool layout analysis and OCR of historical prints using Kraken.

Installation

Requirements

  • Python: 3.9.x - 3.12.x
  • CUDA: 12.x

1. Install modified Kraken version

Kraken Fork

git clone --single-branch --branch octopy https://github.com/jahtz/kraken
pip install kraken/.

2. Install Octopy

git clone https://github.com/jahtz/octopy
pip install octopy/.

3. Setup CUDA (optionally, for GPU computation)

export LD_LIBRARY_PATH="/usr/local/<cuda_version>/lib64:$LD_LIBRARY_PATH"

Usage

General

octopy --help

Layout Segmentation Training

 > octopy segtrain --help
                                                                                                
 Usage: octopy segtrain [OPTIONS]                                                               
                                                                                                
 Train a custom segmentation model using Kraken.                                                
                                                                                                
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ *  --gt         -g  DIRECTORY  Directory containing ground truth XML and matching image      │
│                                files. Multiple directories can be specified.                 │
│                                [required]                                                    │
│    --gt-glob        TEXT       Glob pattern for matching ground truth XML files within the   │
│                                specified directories.                                        │
│                                [default: *.xml]                                              │
│    --eval       -e  DIRECTORY  Optional directory containing evaluation data with matching   │
│                                image files. Multiple directories can be specified.           │
│    --eval-glob      TEXT       Glob pattern for matching XML files in the evaluation         │
│                                directory.                                                    │
│                                [default: *.xml]                                              │
│    --partition  -p  FLOAT      Split ground truth files into training and evaluation sets if │
│                                no evaluation files are provided. Default partition is 90%    │
│                                training, 10% evaluation.                                     │
│                                [default: 0.9]                                                │
│    --model      -m  FILE       Path to a pre-trained model to fine-tune. If not set,         │
│                                training starts from scratch.                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ *  --output              -o   DIRECTORY         Output directory for saving the model and    │
│                                                 checkpoints.                                 │
│                                                 [required]                                   │
│    --name                -n   TEXT              Name of the output model. Used for saving    │
│                                                 results and checkpoints.                     │
│                                                 [default: foo]                               │
│    --device              -d   TEXT              Specify the device for processing (e.g. cpu, │
│                                                 cuda:0, ...). Refer to PyTorch documentation │
│                                                 for supported devices.                       │
│                                                 [default: cpu]                               │
│    --workers             -w   INTEGER RANGE     Number of worker processes for CPU-based     │
│                                                 training.                                    │
│                                                 [default: 1; x>=1]                           │
│    --threads             -t   INTEGER RANGE     Number of threads for CPU-based training.    │
│                                                 [default: 1; x>=1]                           │
│    --resize              -r   [union|new|fail]  Controls how the model's output layer is     │
│                                                 resized if the training data contains        │
│                                                 different classes. `union` adds new classes  │
│                                                 (former `add`), `new` resizes to match the   │
│                                                 training data (former `both`), and `fail`    │
│                                                 aborts training if there is a mismatch.      │
│                                                 [default: new]                               │
│    --suppress-regions                           Disable region segmentation training.        │
│    --suppress-baselines                         Disable baseline segmentation training.      │
│    --valid-regions       -vr  TEXT              Comma-separated list of valid regions to     │
│                                                 include in the training. This option is      │
│                                                 applied before region merging.               │
│    --valid-baselines     -vb  TEXT              Comma-separated list of valid baselines to   │
│                                                 include in the training. This option is      │
│                                                 applied before baseline merging.             │
│    --merge-regions       -mr  TEXT              Region merge mapping. One or more mappings   │
│                                                 of the form `src:target`, where `src` is     │
│                                                 merged into `target`. `src` can be           │
│                                                 comma-separated.                             │
│    --merge-baselines     -mb  TEXT              Baseline merge mapping. One or more mappings │
│                                                 of the form `src:target`, where `src` is     │
│                                                 merged into `target`. `src` can be           │
│                                                 comma-separated.                             │
│    --verbose             -v   INTEGER RANGE     Set verbosity level for logging. Use -vv for │
│                                                 maximum verbosity (levels 0-2).              │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Hyperparameters ────────────────────────────────────────────────────────────────────────────╮
│ --line-width          INTEGER                            Height of baselines in the target   │
│                                                          image after scaling.                │
│                                                          [default: 8]                        │
│ --line-width          INTEGER                            Height of baselines in the target   │
│                                                          image after scaling.                │
│                                                          [default: 8]                        │
│ --padding             <INTEGER INTEGER>...               Padding (left/right, top/bottom)    │
│                                                          around the page image.              │
│                                                          [default: 0, 0]                     │
│ --freq                FLOAT                              Model saving and report generation  │
│                                                          frequency in epochs during          │
│                                                          training. If frequency is >1 it     │
│                                                          must be an integer, i.e. running    │
│                                                          validation every n-th epoch.        │
│                                                          [default: 1.0]                      │
│ --quit                [early|fixed]                      Stop condition for training. Choose │
│                                                          `early` for early stopping or       │
│                                                          `fixed` for a fixed number of       │
│                                                          epochs.                             │
│                                                          [default: fixed]                    │
│ --epochs              INTEGER                            Number of epochs to train for when  │
│                                                          using fixed stopping.               │
│                                                          [default: 50]                       │
│ --min-epochs          INTEGER                            Minimum number of epochs to train   │
│                                                          for before early stopping is        │
│                                                          allowed.                            │
│                                                          [default: 0]                        │
│ --lag                 INTEGER RANGE                      Early stopping patience (number of  │
│                                                          validation steps without            │
│                                                          improvement). Measured by           │
│                                                          val_mean_iu.                        │
│                                                          [default: 10; x>=1]                 │
│ --optimizer           [Adam|SGD|RMSprop|Lamb]            Optimizer to use during training.   │
│                                                          [default: Adam]                     │
│ --lrate               FLOAT                              Learning rate for the optimizer.    │
│                                                          [default: 0.0002]                   │
│ --momentum            FLOAT                              Momentum parameter for applicable   │
│                                                          optimizers.                         │
│                                                          [default: 0.9]                      │
│ --weight-decay        FLOAT                              Weight decay parameter for the      │
│                                                          optimizer.                          │
│                                                          [default: 1e-05]                    │
│ --schedule            [constant|1cycle|exponential|cosi  Set learning rate scheduler. For    │
│                       ne|step|reduceonplateau]           1cycle, cycle length is determined  │
│                                                          by the `--step-size` option.        │
│                                                          [default: constant]                 │
│ --completed-epochs    INTEGER                            Number of epochs already completed. │
│                                                          Used for resuming training.         │
│                                                          [default: 0]                        │
│ --augment                                                Use data augmentation during        │
│                                                          training.                           │
│ --step-size           INTEGER                            Step size for learning rate         │
│                                                          scheduler.                          │
│                                                          [default: 10]                       │
│ --gamma               FLOAT                              Gamma for learning rate scheduler.  │
│                                                          [default: 0.1]                      │
│ --rop-factor          FLOAT                              Factor for reducing learning rate   │
│                                                          on plateau.                         │
│                                                          [default: 0.1]                      │
│ --rop-patience        INTEGER                            Patience for reducing learning rate │
│                                                          on plateau.                         │
│                                                          [default: 5]                        │
│ --cos-t-max           INTEGER                            Maximum number of epochs for cosine │
│                                                          annealing.                          │
│                                                          [default: 50]                       │
│ --cos-min-lr          FLOAT                              Minimum learning rate for cosine    │
│                                                          annealing.                          │
│                                                          [default: 2e-05]                    │
│ --warmup              INTEGER                            Number of warmup epochs for cosine  │
│                                                          annealing.                          │
│                                                          [default: 0]                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

Layout Segmentation Prediction

 > octopy segment --help
                                                                                                
 Usage: octopy segment [OPTIONS] IMAGES...                                                      
                                                                                                
 Segment images using Kraken.                                                                   
 IMAGES: Specify one or more image files to segment. Supports multiple file paths, wildcards,   
 or directories (with the -g option).                                                           
                                                                                                
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ *  IMAGES    PATH  [required]                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --glob    -g  TEXT       Glob pattern for matching images in directories. (used with         │
│                          directories in IMAGES).                                             │
│                          [default: *.ocropus.bin.png]                                        │
│ --model   -m  FILE       Path to custom segmentation model(s). If not provided, the default  │
│                          Kraken model is used.                                               │
│ --output  -o  DIRECTORY  Output directory for processed files. Defaults to the parent        │
│                          directory of each input file.                                       │
│ --suffix  -s  TEXT       Suffix for output PageXML files. Should end with '.xml'.            │
│                          [default: .xml]                                                     │
│ --device  -d  TEXT       Specify the processing device (e.g. 'cpu', 'cuda:0',...). Refer to  │
│                          PyTorch documentation for supported devices.                        │
│                          [default: cpu]                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Fine-Tuning ────────────────────────────────────────────────────────────────────────────────╮
│ --creator             TEXT               Metadata: Creator of the PageXML files.             │
│                                          [default: octopy]                                   │
│ --direction           [hlr|hrl|vlr|vrl]  Text direction of input images. [default: hlr]      │
│ --suppress-lines                         Suppress lines in the output PageXML.               │
│ --suppress-regions                       Suppress regions in the output PageXML. Creates a   │
│                                          single dummy region for the whole image.            │
│ --fallback            INTEGER            Use a default bounding box when the polygonizer     │
│                                          fails to create a polygon around a                  │
│                                          baseline.Requires a box height in pixels.           │
│ --heatmap             TEXT               Generate a heatmap image alongside the PageXML      │
│                                          output. Specify the file extension for the heatmap  │
│                                          (e.g., `.hm.png`).                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

Region Polygon Shrinking

 > octopy shrink --help
                                                                                                
 Usage: octopy shrink [OPTIONS] PAGEXML...                                                      
                                                                                                
 Shrink region polygons of PageXML files.                                                       
 PAGEXML: Specify one or more PageXML files to shrink. Supports multiple file paths, wildcards, 
 or directories (with the -g option).                                                           
                                                                                                
╭─ Input ──────────────────────────────────────────────────────────────────────────────────────╮
│ *  PAGEXML             PATH  [required]                                                      │
│    --glob          -g  TEXT  Glob pattern for matching PageXML files in directories. (used   │
│                              with directories in PAGEXML).                                   │
│                              [default: *.xml]                                                │
│    --input-suffix  -i  TEXT  Suffix for image selection. Should match full suffix of input   │
│                              PageXML files.                                                  │
│                              [default: .bin.png]                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --output         -o   DIRECTORY  Output directory for processed files. Defaults to the       │
│                                  parent directory of each input file.                        │
│ --output-suffix  -s   TEXT       Suffix for shrunken PageXML files. Should end with '.xml'.  │
│                                  Could overwrite input files.                                │
│                                  [default: .xml]                                             │
│ --padding        -p   INTEGER    Padding around the shrunken regions in pixels. [default: 5] │
│ --horizontal     -h   INTEGER    The higher, the more horizontal smoothing is applied.       │
│                                  [default: 3]                                                │
│ --vertical       -v   INTEGER    The higher, the more vertical smoothing is applied.         │
│                                  [default: 3]                                                │
│ --valid-region   -vr  TEXT       Valid regions for shrinking. If nothing is provided, all    │
│                                  regions are shrunk. Multiple selections are possible.       │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯