Skip to content

I started this project a long time ago and stopped working on it due to problems. So, it's supposed to be object detection model in pair with ocr, that interpet unique code on each banknote to text

License

Notifications You must be signed in to change notification settings

dysff/unique_banknote_code_ocr

Repository files navigation

BANKNOTE UNIQUE CODE RECOGNITION

TABLE OF CONTENTS

OVERVIEW

The application is supposed to capture region of interest(ROI) with unique banknote id, which consists of two letters and seven numbers(0-9), and implement OCR on it in order to recognize id, so it would be possible to operate with this code in future.

In this project Russian currency is used as an example of work(one, two and five thousands of rubles).

Important

Project has not been COMPLETED yet. At the moment app isn't working correctly, only 67% of planned functions are included. The app doesn't have properly working ocr.

METHODS

  • Optical character recognition
  • Object detection
  • Data annotation
  • Model's architecture fine-tuning
  • Learning curves visualization
  • Relative coordinates
  • Image resize
  • Augmentation implementation

PROBLEM PROJECT SOLVES

The main problem that this application solves is the theft of banknotes. If your money has been stolen and you have the id of each bill, then you can provide these numbers to the police, which will greatly increase the chance of returning your lost money to your pocket.

Application will be using by people through a telegram bot. Person sends picture of a banknote to telegram bot and after it saves id to the ids database. Each person has private database.

APP ARCHITECTURE

MODEL'S PERFORMANCE

At the moment OCR has 12.5% accuracy. Object detection model crops region of interest with accuracy of 98%. To improve OCR performance it should be fine-tuned on type of the font used on Russian banknotes. So, that's the problem is being solved now.

EXAMPLE OF WORKING

LEARNING CURVES

It's easy to see, that's model is working not quite well. Training loss is higher than test and there's huge gap between them. It indicates that the model is overfitting. So, it's the problem is being solved now.

With many tries and different architectures this is the best result this project ever had at the moment.

CREATING OBJECT DETECTION MODEL TO CAPTURE REGION OF INTEREST

First it's required to detect roi, because OCR needs to see only the text we're interested in without any other distracting characters and signs. Bills are type of things that doesn't have a lot of different patterns(1000 of rubles looks like the other million of 1000 ruble bill and so forth). So, it would be really difficult to teach model on such little pattern as text(our id), especially if we resize banknote pics to the size of 170x128.

And it was decided to teach the model to detect whole banknote and slice this region the way to get ~1/4 of right up part of banknote(where id is located).

Note

Check pic below, where blue region is object's annotation and red is the piece with id, obtained by cropping the image.

ANNOTATON PROCESS

label-studio was used to annotate the samples. Out of 904 available samples only 860 were suitable.

MODEL'S ARCHITECTURE

VGG-16 is used as foundation to experiment with. Activation function of output layer was changed from softmax to sigmoid, also last layer contains 4 neurons, because model will predict 4 metrics: top left x, top left y, down right x, down right y. Input shape was changed.

Deep learning part was fine-tuned, so it has a bit different architecture. Categorical cross-entropy loss function was changed to MSE, because we have regression task now, not classification.

PROBLEMS ENCOUNTERED IN THE PROCESS

RELATIVE COORDINATES

There was no any idea how to work with relative coordinates. It was needed to transform absolute coords to relative, because the model is using sigmoid activation function, where it ranges from 0 to 1.

It was solved by dividing each coordinate by appropriate shape. Here's function, which transfrom these coordinates.

def relative_coords(bbox):
  bbox = [bbox[0] / cols,#top_x
          bbox[1] / rows,#top_y
          bbox[2] / cols,#top_x
          bbox[3] / rows]#top_y
  
  return bbox

NOT ENOUGH DATA

First versions of model were overfitting a lot more. All of that because there're were only 860 data samples.

It was partly solved by implementing augmentation. Best result of all augmentation types was shown by cut-out aug. It decreased overfitting in around two times.

GETTING STARTED

The application does not work for other users, but for the owner. It uses pretrained regression convolutional neural network.

License GitHub

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

I started this project a long time ago and stopped working on it due to problems. So, it's supposed to be object detection model in pair with ocr, that interpet unique code on each banknote to text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages