OpenOCR makes it simple to host your own OCR REST API.
The heavy lifting OCR work is handled by Tesseract OCR.
Docker is used to containerize the various components of the service.
- Scalable message passing architecture
- Platform independence via Docker containers
- Improved accuracy via Stroke Width Transform image preprocessing.
- Pass arguments to Tesseract such as character whitelist and page segment mode
- REST API docs
See Installing Docker on Ubuntu instructions.
$ ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:43:40:c7
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
...
The ip address 10.0.2.15
will be used as the RABBITMQ_HOST
env variable below.
Here's how to launch the docker images needed for OpenOCR.
$ curl -O https://raw.githubusercontent.com/tleyden/open-ocr/master/launcher/launcher.sh
$ export RABBITMQ_HOST=10.0.2.15 RABBITMQ_PASS=supersecret2 HTTP_PORT=8080
$ chmod +x launcher.sh
$ ./launcher.sh
This will start three docker instances:
You are now ready to decode images → text via your REST API.
Request
$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://$DOCKER_HOST:$HTTP_PORT/ocr
Response
It will return the decoded text for the test image:
< HTTP/1.1 200 OK
< Date: Tue, 13 May 2014 16:18:50 GMT
< Content-Length: 283
< Content-Type: text/plain; charset=utf-8
<
You can create local variables for the pipelines within the template by
prefixing the variable name with a “$" sign. Variable names have to be
composed of alphanumeric characters and the underscore. In the example
below I have used a few variations that work for variable names.
To see other parameters that you can pass in the request, see the REST API docs
You can also run OpenOCR on any PAAS that supports Docker containers. Here are the instructions for a few that have already been tested:
- Google Compute Engine
- Orchard
- Tutum
- More coming soon ..
- Follow @OpenOCR on Twitter
- Checkout the Github issue tracker
OpenOCR is Open Source and available under the Apache 2 License.