Skip to content
forked from tleyden/open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker

License

Notifications You must be signed in to change notification settings

chauthai/open-ocr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenOCR makes it simple to host your own OCR REST API.

The heavy lifting OCR work is handled by Tesseract OCR.

Docker is used to containerize the various components of the service.

screenshot

Features

  • Scalable message passing architecture
  • Platform independence via Docker containers
  • Improved accuracy via Stroke Width Transform image preprocessing.
  • Pass arguments to Tesseract such as character whitelist and page segment mode
  • REST API docs

Launching OpenOCR on Ubuntu 14.04

Install Docker

See Installing Docker on Ubuntu instructions.

Find out your host address

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 08:00:27:43:40:c7
          inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
          ...

The ip address 10.0.2.15 will be used as the RABBITMQ_HOST env variable below.

Launch docker images

Here's how to launch the docker images needed for OpenOCR.

$ curl -O https://raw.githubusercontent.com/tleyden/open-ocr/master/launcher/launcher.sh
$ export RABBITMQ_HOST=10.0.2.15 RABBITMQ_PASS=supersecret2 HTTP_PORT=8080
$ chmod +x launcher.sh
$ ./launcher.sh

This will start three docker instances:

You are now ready to decode images → text via your REST API.

Test the REST API

Request

$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://$DOCKER_HOST:$HTTP_PORT/ocr

Response

It will return the decoded text for the test image:

< HTTP/1.1 200 OK
< Date: Tue, 13 May 2014 16:18:50 GMT
< Content-Length: 283
< Content-Type: text/plain; charset=utf-8
<
You can create local variables for the pipelines within the template by
prefixing the variable name with a “$" sign. Variable names have to be
composed of alphanumeric characters and the underscore. In the example
below I have used a few variations that work for variable names.

To see other parameters that you can pass in the request, see the REST API docs

Launching OpenOCR on a Docker PAAS

You can also run OpenOCR on any PAAS that supports Docker containers. Here are the instructions for a few that have already been tested:

Community

License

OpenOCR is Open Source and available under the Apache 2 License.

About

Run your own OCR-as-a-Service using Tesseract and Docker

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 97.3%
  • Shell 2.7%