Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 899 Bytes

README.md

File metadata and controls

40 lines (30 loc) · 899 Bytes

Invoice-Receipt-OCR

This project aims to automate the receipt/invoice parsing process.

Installation and Prerequisite

Python Modules

# to add rating for text extraction process
pip install python-Levenshtein

# images and preprocessing
pip install Wand
pip install opencv-python

# ocr engine
pip install pytesseract

# PDF text extraction tool -> not required for now
pip install pdfminer.six

Environments

If you are using windows, you should set PATH for imagemagik and tesseract.

TODO

  • Add testing codes
  • Core Functions:
    • amount
    • invoice #
    • bill date vs due date
    • address
    • vendor name
  • Optimize your rating process