Flexible 3D Object Detection (Flex3D-bbox)

Introduction

Building on the work by Bugra Tekin, Sudipta N. Sinha, and Pascal Fua in "Real-Time Seamless Single Shot 6D Object Pose Prediction" (CVPR 2018), this implementation enhances the original framework to better address everyday object detection tasks. Paper

"We've integrated fine-tuning across diverse datasets, ranging from custom-labeled data to standard benchmarks, with seamless conversion into a unified labeling format. The system supports multiple input types, including images, videos, and even webcam feeds, and is optimized for robust multi-object and multi-class inference. These enhancements make the method highly adaptable and effective for a wide range of real-world applications."

Key Features

Utilizing Various Datasets: Includes parcel3d, AIHUB, and other "manually labeled" custom datasets.
Omitting Reprojection Process: Streamlined pipeline by removing unnecessary reprojection.
Generating Inference Code: Easy-to-use inference code generation.
Adding Multi-Object Inference: Enhanced capabilities for detecting multiple objects simultaneously.
Introducing Anchors: Improved detection accuracy through the use of anchors.

1. Download the Repository

Download the repository including the necessary datasets:

git clone https://github.com/jungarden/Flex3D-bbox.git

2. System Environment

Ensure your environment meets the following requirements:

Python: 3.6
CUDA: 11.1
Cudnn: 8
Docker: Image: nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04

Install the required libraries as follows:

PyTorch:

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

OpenCV:

pip install opencv-python
# Alternatively, install a specific version:
# pip install opencv-contrib-python==4.1.0.25

Scipy:
```
pip install scipy==1.2.0
```
Pillow:
```
pip install pillow==8.2.0
```
tqdm:
```
pip install tqdm==4.64.1
```

3. Parsing

Before training, ensure your dataset labels are correctly formatted using the txt_labels.py script: This script parses and converts your dataset's labeling information into the format required for training. Make sure to select the appropriate labeling method for your dataset, whether it is manually labeled or follows the AIHUB dataset format.

python3 making_txt_labels.py

glove00 folder structure

glove00.data

4-1. Training (Multi-Object)

To train the model on multiple objects across datasets, use the following command:

python3 train_multi.py \
--datacfg data/occlusion.data \
--modelcfg
cfg/yolo-pose-multi.cfg \
--initweightfile cfg/darknet19_448.conv.23 \ 
--pretrain_num_epochs 15

darknet19_448

4-2. Training (Finetuning)

For finetuning on a custom dataset, run:

python3 train.py \
--datacfg data/trainbox.data \
--modelcfg cfg/yolo-pose.cfg \
--initweightfile backup/parcel3d/model.weights \
--pretrain_num_epochs 5

5. Inference

To perform inference on a video file, execute:

python3 inference.py \
--datacfg data/occlusion.data \
--modelcfg cfg/yolo-pose-multi.cfg \
--initweightfile backup_multi/model.weights \
--file video.mp4

6. Results

Below is an example of the detection results: *multi classes

7. References

Original Source: Microsoft SingleShotPose
Other Source: MISOChallenge-3Dobject

Additional Information

System Architecture

Repository Structure:
- baseline/: Single object detection
- multi/: Multi-object detection
- dataset/: Contains various datasets
- utils/: Contains utility functions (e.g get_anchors.py)

Code Modifications

train.py: Removed internal parameters, rotation matrices, and reprojection variables.
utils.py: Created build_target_anchors to consider anchors in single detection('baseline'), modified 'get_region_boxes' to consider anchors.
image.py & dataset.py: Updated paths for custom datasets.
yolo-pose.cfg: Adjusted the number of filters for anchors and classes.
inference.py: Added visualization for bounding boxes and classes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Flexible 3D Object Detection (Flex3D-bbox)

Introduction

Key Features

1. Download the Repository

2. System Environment

3. Parsing

4-1. Training (Multi-Object)

4-2. Training (Finetuning)

5. Inference

6. Results

7. References

Additional Information

System Architecture

Code Modifications

Files

README.md

Latest commit

History

README.md

File metadata and controls

Flexible 3D Object Detection (Flex3D-bbox)

Introduction

Key Features

1. Download the Repository

2. System Environment

3. Parsing

4-1. Training (Multi-Object)

4-2. Training (Finetuning)

5. Inference

6. Results

7. References

Additional Information

System Architecture

Code Modifications