Skip to content

Commit

Permalink
Merge pull request #348 from OCR-D/models-in-docu
Browse files Browse the repository at this point in the history
make models page in setup guide more prominent and rearrange
  • Loading branch information
lena-hinrichsen authored Apr 6, 2023
2 parents c5004ff + d417f56 commit 82bc2d9
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 40 deletions.
83 changes: 48 additions & 35 deletions site/en/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,15 @@ name).

The second line of each entry contains a short description of the resource.

## Installing known resources
## Installing resources

On installing resources in OCR-D, read the sections [Installing known resources](#installing-known-resources) and [Installing unknown resources](#installing-unknown-resources).

*Known resources* are resources that are provided by processor developers [in the `ocrd-tool.json`](/en/spec/ocrd_tool#file-parameters) and are available by name to `ocrd resmgr download`, whereas *unknown* resources are models, configurations, parameter sets etc. you provide yourself or found elsewhere on the Internet, which require passing a URL to `ocrd resmgr download`.

**If you installed OCR-D via Docker,** read the section [Models and Docker](#models-and-docker) *additionally*.

### Installing known resources

You can install resources with the `ocrd resmgr download` command. It expects
the name of the processor as the first argument and either the name or URL of a
Expand Down Expand Up @@ -97,7 +105,7 @@ ocrd resmgr download '*'

(In either case, `*` must be in quotes or escaped to avoid wildcard expansion by the shell.)

## Installing unknown resources
### Installing unknown resources

If you need to install a resource which OCR-D doesn't know of, that can be achieved by passings its URL in combination with the `--any-url/-n` flag to `ocrd resmgr download`:

Expand All @@ -114,6 +122,40 @@ This will download and store the resource in the [proper location](#where-is-the
ocrd-tesserocr-recognize -P model mymodel
```

### Models and Docker

If you are using OCR-D with Docker, we recommend keeping all downloaded resources in a persistent host directory,
separate of the OCR-D Docker container(s) and data directory, and mounting that
resource directory into a specific path in the container alongside the data directory.
The host resource directory can be empty initially. Each time you run the Docker container,
your processors will access the host directory to resolve resources, and you can download
additional models into that location using `ocrd resmgr`.

The following will assume (without loss of generality) that your host-side data
path is under `./data`, and the host-side resource path is under `./models`:

To download models to `./models` in the host FS and `/usr/local/share/ocrd-resources` in the container FS:

```sh
docker run --user $(id -u) \
--volume $PWD/models:/usr/local/share/ocrd-resources \
ocrd/all \
ocrd resmgr download ocrd-tesserocr-recognize eng.traineddata\; \
ocrd resmgr download ocrd-calamari-recognize default\; \
...
```

To run processors, as usual do:

```sh
docker run --user $(id -u) --workdir /data \
--volume $PWD/data:/data \
--volume $PWD/models:/usr/local/share/ocrd-resources \
ocrd/all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
```

This principle applies to all `ocrd/*` Docker images, e.g. you can replace `ocrd/all` above with `ocrd/tesserocr` as well.

## List installed resources

The `ocrd resmgr list-installed` command has the same output format as `ocrd resmgr list-available`. But instead
Expand Down Expand Up @@ -239,39 +281,6 @@ ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P model 'deu+frk'
ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P Fraktur
```

# Models and Docker

We recommend keeping all downloaded resources in a persistent host directory,
separate of the `ocrd/*` Docker container and data directory, and mounting that
resource directory into a specific path in the container alongside the data directory.
The host resource directory can be empty initially. Each time you run the Docker container,
your processors will access the host directory to resolve resources, and you can download
additional models into that location using `ocrd resmgr`.

The following will assume (without loss of generality) that your host-side data
path is under `./data`, and the host-side resource path is under `./models`:

To download models to `./models` in the host FS and `/usr/local/share/ocrd-resources` in Docker:

```sh
docker run --user $(id -u) \
--volume $PWD/models:/usr/local/share/ocrd-resources \
ocrd/all \
ocrd resmgr download ocrd-tesserocr-recognize eng.traineddata\; \
ocrd resmgr download ocrd-calamari-recognize default\; \
...
```

To run processors, as usual do:

```sh
docker run --user $(id -u) --workdir /data \
--volume $PWD/data:/data \
--volume $PWD/models:/usr/local/share/ocrd-resources \
ocrd/all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
```

This principle applies to all `ocrd/*` Docker images, e.g. you can replace `ocrd/all` above with `ocrd/tesserocr` as well.

# Model training

Expand All @@ -293,3 +302,7 @@ Especially if you want to use several OCR engines for your workflows or are not
results, this might be particularly effective for you. Just like `tesstrain` it is not included in `ocrd_all`, meaning
you will still have to install it, first. For information on the setup and the training process itself see the
[Readme](https://github.com/OCR-D/okralact) in the GithHub Repository.

# Further reading

If you just installed OCR-D and want to know how to process your own data, please see the [user guide](/en/user_guide).
18 changes: 13 additions & 5 deletions site/en/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,9 +183,6 @@ You can spin up a Docker container, mounting the current working directory like
docker run --user $(id -u) --workdir /data --volume $PWD:/data -- ocrd/all:maximum ocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK-DOCKER
```

For instructions on how to proceed further with the processing of your data, please see the [user guide](/en/user_guide). Make sure to also read [the notes on translating native command line calls to docker calls](/en/user_guide#translating-native-commands-to-docker-calls).


### Updating Docker image

To update the Docker image to the latest version, just run the `docker pull` command:<br>
Expand All @@ -195,6 +192,10 @@ To update the Docker image to the latest version, just run the `docker pull` com
docker pull ocrd/all:maximum
```

### Further reading

We recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.
Alternatively, for instructions on how to proceed further with the processing of your data, please see the [user guide](/en/user_guide). Make sure to also read [the notes on translating native command line calls to docker calls](/en/user_guide#translating-native-commands-to-docker-calls).

## ocrd_all natively

Expand Down Expand Up @@ -291,8 +292,6 @@ first [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) file group
ocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK
```

For instructions on how to process your own data, please see the [user guide](/en/user_guide).

### Updating the software

As `ocrd_all` is in [active
Expand All @@ -315,6 +314,10 @@ This will run the installation process for all submodules which have been change
say that the last processor was installed successfully. `--version` for the processors which have been changed
should give you its current version.

### Further reading

We recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.
For instructions on how to process your own data, please see the [user guide](/en/user_guide).

## Individual installation (experts only)

Expand Down Expand Up @@ -439,3 +442,8 @@ pip install -e .
This way, you won't have to reinstall after making changes.
Now you can [test your installation](#testing-the-native-installation).
## Installing models
Several processors in OCR-D need pretrained models you have to install beforehand.
Please consult our [instruction on models](/en/models) to get more information on how to download and install them.

0 comments on commit 82bc2d9

Please sign in to comment.