Fast MOT is a multiple object tracker that implements:
- YOLO detector
- SSD detector
- Deep SORT + OSNet ReID
- KLT optical flow tracking
- Camera motion compensation
Deep learning models are usually the bottleneck in Deep SORT, which makes Deep SORT unscalable for real-time applications. This repo significantly speeds up the entire system to run in real-time even on Jetson. It also provides enough flexibility to customize the speed-accuracy tradeoff without a lightweight model.
To achieve faster processing, the tracker only runs detector and feature extractor every N frames. Optical flow is then used to fill in the gaps. I swapped the feature extractor in Deep SORT for a better ReID model, OSNet. I also added a feature to re-identify targets that moved out of frame so that the tracker can keep the same IDs. I trained YOLOv4 on CrowdHuman while SSD's are pretrained COCO models from TensorFlow.
Both detector and feature extractor use the TensorRT backend and perform asynchronous inference. In addition, most algorithms, including Kalman filter, optical flow, and data association, are optimized using Numba.
Sequence | Density | MOTA (SSD) | MOTA (YOLOv4) | MOTA (public) | FPS |
---|---|---|---|---|---|
MOT17-13 | 5 - 30 | 19.8% | 45.6% | 41.3% | 38 |
MOT17-04 | 30 - 50 | 43.8% | 61.0% | 75.1% | 22 |
MOT17-03 | 50 - 80 | - | - | - | 15 |
Performance is evaluated with the MOT17 dataset on Jetson Xavier NX using py-motmetrics. When using public detections from MOT17, the MOTA scores are close to state-of-the-art trackers. Tracking speed can reach up to 38 FPS depending on the number of objects. On a desktop CPU/GPU, FPS should be even higher.
This means even though the tracker runs much faster, it is still highly accurate. More lightweight detector/feature extractor can potentially be used to obtain more speedup. Note that plain Deep SORT + YOLO struggles to run in real-time on most edge devices and desktop machines.
- CUDA >= 10
- cuDNN >= 7
- TensorRT >= 7
- OpenCV >= 3.3
- PyCuda
- Numpy >= 1.15
- Scipy >= 1.5
- TensorFlow <= 1.15.2 (for SSD support)
- Numba >= 0.48
- cython-bbox
Make sure to have JetPack 4.4 installed and run the script
$ scripts/install_jetson.sh
Make sure to have CUDA, cuDNN, and TensorRT (including Python API) installed. You can optionally use my script to install from scratch
$ scripts/install_tensorrt.sh
Install UFF and Graph Surgeon for SSD support: GeekAlexis#15 (comment)
Build OpenCV from source with GStreamer (optional). GStreamer is recommended for performance. Modify ARCH_BIN
here to match your GPU compute capability
$ scripts/install_opencv.sh
Install Python dependencies
$ pip3 install -r requirements.txt
This includes both pretrained OSNet, SSD, and my custom YOLOv4 ONNX model
$ scripts/download_models.sh
$ cd fastmot/plugins
$ make
Only required if you want to use SSD
$ scripts/download_data.sh
- USB Camera:
$ python3 app.py --input_uri /dev/video0 --mot
- CSI Camera:
$ python3 app.py --input_uri csi://0 --mot
- RTSP IP Camera:
$ python3 app.py --input_uri rtsp://<user>:<password>@<ip>:<port> --mot
- Video file:
$ python3 app.py --input_uri video.mp4 --mot
- Use
--gui
to visualize and--output_uri
to save output - To disable the GStreamer backend, set
WITH_GSTREAMER = False
here - Note that the first run will be slow due to Numba compilation
- More options can be configured in
cfg/mot.json
- Set
camera_size
andcamera_fps
to match your camera setting. List all settings for your camera:$ v4l2-ctl -d /dev/video0 --list-formats-ext
- To change detector, modify
detector_type
. This can be eitherYOLO
orSSD
- To change classes, set
class_ids
under the correct detector. Default class is1
, which corresponds to person - To swap model, modify
model
under a detector. For SSD, you can choose fromSSDInceptionV2
,SSDMobileNetV1
, orSSDMobileNetV2
- Note that with SSD, the detector splits a frame into tiles and processes them in batches for the best accuracy. Change
tiling_grid
to[2, 2]
,[2, 1]
, or[1, 1]
if a smaller batch size is preferred - If more accuracy is desired and processing power is not an issue, reduce
detector_frame_skip
. Similarly, increasedetector_frame_skip
to speed up tracking at the cost of accuracy. You may also want to changemax_age
such thatmax_age * detector_frame_skip
is around30-40
- Set
- Please star if you find this repo useful/interesting
This repo supports multi-class tracking and thus can be easily extended to custom classes (e.g. vehicle). You need to train both YOLO and a ReID model on your object classes. Check Darknet for training YOLO and fast-reid for training ReID. After training, convert the model to ONNX format and place it under fastmot/models
. To convert YOLO to ONNX, tensorrt_demos is a great reference.
- Subclass
YOLO
like here: https://github.com/GeekAlexis/FastMOT/blob/4e946b85381ad807d5456f2ad57d1274d0e72f3d/fastmot/models/yolo.py#L94Note that anchors may not follow the same order in the Darknet cfg file. You need to mask out the anchors for each yolo layer using the indices inENGINE_PATH: path to TensorRT engine (converted at runtime) MODEL_PATH: path to ONNX model NUM_CLASSES: total number of classes INPUT_SHAPE: input size in the format "(channel, height, width)" LAYER_FACTORS: scale factors with respect to the input size for each yolo layer For YOLOv3, change to [32, 16, 8] For YOLOv3/v4-tiny, change to [32, 16] SCALES: scale_x_y parameter for each yolo layer For YOLOv3, change to [1., 1., 1.] For YOLOv3-tiny, change to [1., 1.] For YOLOv4-tiny, change to [1.05, 1.05] ANCHORS: anchors grouped by each yolo layer
mask
in Darknet cfg. Unlike YOLOv4, the anchors are usually in reverse for YOLOv3 and tiny - Change class labels here to your object classes
- Modify
cfg/mot.json
: underyolo_detector
, setmodel
to the added Python class and setclass_ids
you want to detect. You may want to play withconf_thresh
based on the accuracy of your model
- Subclass
ReID
like here: https://github.com/GeekAlexis/FastMOT/blob/aa707888e39d59540bb70799c7b97c58851662ee/fastmot/models/reid.py#L51ENGINE_PATH: path to TensorRT engine (converted at runtime) MODEL_PATH: path to ONNX model INPUT_SHAPE: input size in the format "(channel, height, width)" OUTPUT_LAYOUT: feature dimension output by the model (e.g. 512) METRIC: distance metric used to match features ('euclidean' or 'cosine')
- Modify
cfg/mot.json
: underfeature_extractor
, setmodel
to the added Python class. You may want to play withmax_feat_cost
andmax_reid_cost
- float values from0
to2
, based on the accuracy of your model