Merge pull request #348 from OCR-D/models-in-docu

make models page in setup guide more prominent and rearrange
OCR-D · Apr 6, 2023 · 82bc2d9 · 82bc2d9
2 parents c5004ff + d417f56
commit 82bc2d9
Show file tree

Hide file tree

Showing 2 changed files with 61 additions and 40 deletions.
diff --git a/site/en/models.md b/site/en/models.md
@@ -60,7 +60,15 @@ name).
 
 The second line of each entry contains a short description of the resource.
 
-## Installing known resources
+## Installing resources
+
+On installing resources in OCR-D, read the sections [Installing known resources](#installing-known-resources) and [Installing unknown resources](#installing-unknown-resources).
+
+*Known resources* are resources that are provided by processor developers [in the `ocrd-tool.json`](/en/spec/ocrd_tool#file-parameters) and are available by name to `ocrd resmgr download`, whereas *unknown* resources are models, configurations, parameter sets etc. you provide yourself or found elsewhere on the Internet, which require passing a URL to `ocrd resmgr download`.
+
+**If you installed OCR-D via Docker,** read the section [Models and Docker](#models-and-docker) *additionally*. 
+
+### Installing known resources
 
 You can install resources with the `ocrd resmgr download` command. It expects
 the name of the processor as the first argument and either the name or URL of a
@@ -97,7 +105,7 @@ ocrd resmgr download '*'
 
 (In either case, `*` must be in quotes or escaped to avoid wildcard expansion by the shell.)
 
-## Installing unknown resources
+### Installing unknown resources
 
 If you need to install a resource which OCR-D doesn't know of, that can be achieved by passings its URL in combination with the `--any-url/-n` flag to `ocrd resmgr download`:
 
@@ -114,6 +122,40 @@ This will download and store the resource in the [proper location](#where-is-the
 ocrd-tesserocr-recognize -P model mymodel
 ```
 
+### Models and Docker
+
+If you are using OCR-D with Docker, we recommend keeping all downloaded resources in a persistent host directory,
+separate of the OCR-D Docker container(s) and data directory, and mounting that
+resource directory into a specific path in the container alongside the data directory.
+The host resource directory can be empty initially. Each time you run the Docker container,
+your processors will access the host directory to resolve resources, and you can download
+additional models into that location using `ocrd resmgr`.
+
+The following will assume (without loss of generality) that your host-side data
+path is under `./data`, and the host-side resource path is under `./models`:
+
+To download models to `./models` in the host FS and `/usr/local/share/ocrd-resources` in the container FS:
+
+```sh
+docker run --user $(id -u) \
+  --volume $PWD/models:/usr/local/share/ocrd-resources \
+ocrd/all \
+ocrd resmgr download ocrd-tesserocr-recognize eng.traineddata\; \
+ocrd resmgr download ocrd-calamari-recognize default\; \
+...
+```
+
+To run processors, as usual do:
+
+```sh
+docker run --user $(id -u) --workdir /data \
+  --volume $PWD/data:/data \
+  --volume $PWD/models:/usr/local/share/ocrd-resources \
+  ocrd/all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
+```
+
+This principle applies to all `ocrd/*` Docker images, e.g. you can replace `ocrd/all` above with `ocrd/tesserocr` as well.
+
 ## List installed resources
 
 The `ocrd resmgr list-installed` command has the same output format as `ocrd resmgr list-available`. But instead
@@ -239,39 +281,6 @@ ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P model 'deu+frk'
 ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P Fraktur
 ```
 
-# Models and Docker
-
-We recommend keeping all downloaded resources in a persistent host directory,
-separate of the `ocrd/*` Docker container and data directory, and mounting that
-resource directory into a specific path in the container alongside the data directory.
-The host resource directory can be empty initially. Each time you run the Docker container,
-your processors will access the host directory to resolve resources, and you can download
-additional models into that location using `ocrd resmgr`.
-
-The following will assume (without loss of generality) that your host-side data
-path is under `./data`, and the host-side resource path is under `./models`:
-
-To download models to `./models` in the host FS and `/usr/local/share/ocrd-resources` in Docker:
-
-```sh
-docker run --user $(id -u) \
-  --volume $PWD/models:/usr/local/share/ocrd-resources \
-ocrd/all \
-ocrd resmgr download ocrd-tesserocr-recognize eng.traineddata\; \
-ocrd resmgr download ocrd-calamari-recognize default\; \
-...
-```
-
-To run processors, as usual do:
-
-```sh
-docker run --user $(id -u) --workdir /data \
-  --volume $PWD/data:/data \
-  --volume $PWD/models:/usr/local/share/ocrd-resources \
-  ocrd/all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
-```
-
-This principle applies to all `ocrd/*` Docker images, e.g. you can replace `ocrd/all` above with `ocrd/tesserocr` as well.
 
 # Model training
 
@@ -293,3 +302,7 @@ Especially if you want to use several OCR engines for your workflows or are not
 results, this might be particularly effective for you. Just like `tesstrain` it is not included in `ocrd_all`, meaning 
 you will still have to install it, first. For information on the setup and the training process itself see the
 [Readme](https://github.com/OCR-D/okralact) in the GithHub Repository.
+
+# Further reading
+
+If you just installed OCR-D and want to know how to process your own data, please see the [user guide](/en/user_guide).
diff --git a/site/en/setup.md b/site/en/setup.md
@@ -183,9 +183,6 @@ You can spin up a Docker container, mounting the current working directory like
 docker run --user $(id -u) --workdir /data --volume $PWD:/data -- ocrd/all:maximum ocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK-DOCKER
 ```
 
-For instructions on how to proceed further with the processing of your data, please see the [user guide](/en/user_guide). Make sure to also read [the notes on translating native command line calls to docker calls](/en/user_guide#translating-native-commands-to-docker-calls).
-
-
 ### Updating Docker image
 
 To update the Docker image to the latest version, just run the `docker pull` command:<br> 
@@ -195,6 +192,10 @@ To update the Docker image to the latest version, just run the `docker pull` com
 docker pull ocrd/all:maximum
 ```
 
+### Further reading
+
+We recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.
+Alternatively, for instructions on how to proceed further with the processing of your data, please see the [user guide](/en/user_guide). Make sure to also read [the notes on translating native command line calls to docker calls](/en/user_guide#translating-native-commands-to-docker-calls).
 
 ## ocrd_all natively
 
@@ -291,8 +292,6 @@ first [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) file group
 ocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK
 ```
 
-For instructions on how to process your own data, please see the [user guide](/en/user_guide).
-
 ### Updating the software
 
 As `ocrd_all` is in [active
@@ -315,6 +314,10 @@ This will run the installation process for all submodules which have been change
 say that the last processor was installed successfully. `--version` for the processors which have been changed
 should give you its current version. 
 
+### Further reading
+
+We recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.
+For instructions on how to process your own data, please see the [user guide](/en/user_guide).
 
 ## Individual installation (experts only)
 
@@ -439,3 +442,8 @@ pip install -e .
 This way, you won't have to reinstall after making changes.
 
 Now you can [test your installation](#testing-the-native-installation).
+
+## Installing models
+
+Several processors in OCR-D need pretrained models you have to install beforehand. 
+Please consult our [instruction on models](/en/models) to get more information on how to download and install them.