This model is a neural network for real-time object detection that detects 80 different classes. It is very fast and accurate.
Model | Download | Download (with sample test data) | ONNX version | Opset version | Accuracy |
---|---|---|---|---|---|
YOLOv3 | 237 MB | 222 MB | 1.5 | 10 | mAP of 0.553 |
Resized image (1x3x416x416)
Original image size (1x2)
which is [image.size[1], image.size[0]]
The images have to be loaded in to a range of [0, 1]. The transformation should preferrably happen at preprocessing.
The following code shows how to preprocess a NCHW tensor:
import numpy as np
from PIL import Image
# this function is from yolo3.utils.letterbox_image
def letterbox_image(image, size):
'''resize image with unchanged aspect ratio using padding'''
iw, ih = image.size
w, h = size
scale = min(w/iw, h/ih)
nw = int(iw*scale)
nh = int(ih*scale)
image = image.resize((nw,nh), Image.BICUBIC)
new_image = Image.new('RGB', size, (128,128,128))
new_image.paste(image, ((w-nw)//2, (h-nh)//2))
return new_image
def preprocess(img):
model_image_size = (416, 416)
boxed_image = letterbox_image(img, tuple(reversed(model_image_size)))
image_data = np.array(boxed_image, dtype='float32')
image_data /= 255.
image_data = np.transpose(image_data, [2, 0, 1])
image_data = np.expand_dims(image_data, 0)
return image_data
image = Image.open(img_path)
# input
image_data = preprocess(image)
image_size = np.array([image.size[1], image.size[0]], dtype=np.int32).reshape(1, 2)
The model has 3 outputs.
boxes: (1x'n_candidates'x4)
, the coordinates of all anchor boxes,
scores: (1x80x'n_candidates')
, the scores of all anchor boxes per class,
indices: ('nbox'x3)
, selected indices from the boxes tensor. The selected index format is (batch_index, class_index, box_index). The class list is here
Post processing and meaning of output
out_boxes, out_scores, out_classes = [], [], []
for idx_ in indices:
out_classes.append(idx_[1])
out_scores.append(scores[tuple(idx_)])
idx_1 = (idx_[0], idx_[2])
out_boxes.append(boxes[idx_1])
out_boxes, out_scores, out_classes are list of resulting boxes, scores, and classes.
We use pretrained weights from pjreddie.com here.
Metric is COCO box mAP (averaged over IoU of 0.5:0.95), computed over 2017 COCO val data. mAP of 0.553 based on original Yolov3 model here
Joseph Redmon, Ali Farhadi. YOLOv3: An Incremental Improvement, paper
This model is converted from a keras model repository using keras2onnx converter repository.
MIT License