SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar1 ·
Yuliang Guo2 ·
Xinyu Huang2 ·
Liu Ren2 ·
Xiaoming Liu1
1Michigan State University, 2Bosch Research North America, Bosch Center for AI
in CVPR 2024
Monocular 3D detectors achieve remarkable performance on cars and smaller objects. However, their performance drops on larger objects, leading to fatal accidents. Some attribute the failures to training data scarcity or the receptive field requirements of large objects. In this paper, we highlight this understudied problem of generalization to large objects. We find that modern frontal detectors struggle to generalize to large objects even on nearly balanced datasets. We argue that the cause of failure is the sensitivity of depth regression losses to noise of larger objects. To bridge this gap, we comprehensively investigate regression and dice losses, examining their robustness under varying error levels and object sizes. We mathematically prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses for a simplified case. Leveraging our theoretical insights, we propose SeaBird (Segmentation in Bird's View) as the first step towards generalizing to large objects. SeaBird effectively integrates BEV segmentation on foreground objects for 3D detection, with the segmentation head trained with the dice loss. SeaBird achieves SoTA results on the KITTI-360 leaderboard and improves existing detectors on the nuScenes leaderboard, particularly for large objects.
Much of the codebase is based on PanopticBEV. Some implementations are from BBAVectors and our DEVIANT.
If you find our work useful in your research, please consider starring the repo and citing:
@inproceedings{kumar2024seabird,
title={{SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular $3$D Detection of Large Objects}},
author={Kumar, Abhinav and Guo, Yuliang and Huang, Xinyu and Ren, Liu and Liu, Xiaoming},
booktitle={CVPR},
year={2024}
}
-
Requirements
- Python 3.7
- PyTorch 1.11
- Torchvision 0.12
- Cuda 11.3
- Ubuntu 20.04
This is tested with NVIDIA RTX6000 (48 GB) GPU. Other platforms have not been tested. Clone the repo first. Unless otherwise stated, the below scripts and instructions assume the working directory is the directory
SeaBird/PanopticBEV
:git clone https://github.com/abhi1kumar/SeaBird.git cd PanopticBEV
-
Cuda & Python
-
Create a python conda environment and activate it.
conda create -n panoptic python=3.7 -y conda activate panoptic conda install -c conda-forge ipython -y
-
Point to Cuda
source cuda_11.3_env
For MSU HPCC, use the command
module load CUDA/11.0.2 cuDNN/8.0.4.30-CUDA-11.0.2
instead ofsource cuda_11.3_env
. -
Install the python dependencies using the
requirements.txt
file.pip install -r requirements.txt
-
Compile the PanopticBEV code with Cuda.
python3 setup.py develop
-
Compile DoTA devkit for polygon NMS.
cd panoptic_bev/data/DOTA_devkit swig -c++ -python polyiou.i python setup.py build_ext --inplace cd ../../..
-
-
KITTI-360 Data
-
Download the KITTI-360 and KITTI-360 PanopticBEV datasets.
-
Download the processed KITTI-360
train_val
and dummytesting
labels. Extract them. -
Arrange datasets as
-
SeaBird/PanopticBEV
├── data
│ └── kitti_360
│ ├── ImageSets
│ ├── KITTI-360
│ │ ├── calibration
│ │ ├── data_2d_raw
│ │ ├── data_2d_semantics
│ │ ├── data_3d_boxes
│ │ └── data_poses
│ ├── kitti360_panopticbev
│ │ ├── bev_msk
│ │ ├── class_weights
│ │ ├── front_msk_seam
│ │ ├── front_msk_trainid
│ │ ├── img
│ │ ├── split
│ │ ├── metadata_front.bin
│ │ └── metadata_ortho.bin
│ ├── train_val
│ │ ├── calib
│ │ ├── label
│ │ └── label_dota
│ └── testing
│ ├── calib
│ ├── label
│ └── label_dota
│ ...
Copy metadata_ortho.bin
file
cp data/kitti_360/kitti360_panopticbev/metadata_ortho.bin data/kitti_360/
Next, extract the BEV Segmentation labels and finally link the corresponding images.
python data/kitti_360/generate_BEV_semantic_seg_GT.py
python data/kitti_360/setup_split.py
You should see the following structure with 61056
samples in each sub-folder of train_val
split, and 910
samples in each
sub-folder of testing
split.
SeaBird/PanopticBEV
├── data
│ └── kitti_360
│ ├── ImageSets
│ ├── train_val
│ │ ├── calib
│ │ ├── image
│ │ ├── label
│ │ ├── label_dota
│ │ ├── panop
│ │ └── seman
│ ├── testing
│ │ ├── calib
│ │ ├── image
│ │ ├── label
│ │ ├── label_dota
│ │ ├── panop
│ │ └── seman
│ └── metadata_ortho.bin
Train the model:
chmod +x scripts_training.sh
bash scripts_training.sh
The configuration parameters of the model are in the experiments/
folder where you can modify the model parameters.
We provide logs/models/predictions for the main experiments on KITTI-360 Val /KITTI-360 Test data splits available to download here.
Data_Splits | Method | Config (Run) |
Weight /Pred |
Metrics | Lrg (50) |
Car (50) |
Mean (50) |
Lrg (25) |
Car (25) |
Mean (25) |
Lrg Seg |
Car Seg |
Mean Seg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
KITTI-360 Val | Stage 1 | seabird_val_stage1 | gdrive | IoU | - | - | - | - | - | - | 23.83 | 48.54 | 36.18 |
KITTI-360 Val | PBEV+SeaBird | seabird_val | gdrive | AP | 13.22 | 42.46 | 27.84 | 37.15 | 52.53 | 44.84 | 24.30 | 48.04 | 36.17 |
KITTI-360 Test | PBEV+SeaBird | seabird_test | gdrive | AP | - | - | 4.64 | - | - | 37.12 | - | - | - |
Make output
folder in the SeaBird
directory:
mkdir output
Place models in the output
folder as follows:
SeaBird/PanopticBEV
├── output
│ ├── pbev_seabird_kitti360_val_stage1
│ │ └── saved_models
│ │ └── model_19.pth
│ │
│ ├── pbev_seabird_kitti360_val
│ │ └── saved_models
│ │ └── model_19.pth
│ │
│ └── pbev_seabird_kitti360_test
│ └── saved_models
│ └── model_9.pth
To test, execute the following command:
chmod +x scripts_inference.sh
bash scripts_inference.sh
To get qualitative plots and visualize the predicted+GT boxes, type the following:
python plot/plot_qualitative_output.py --dataset kitti_360 --folder output/pbev_seabird_kitti360_val/results_test/data
The above script visualizes boxes in frontal and BEV. To visualize BEV segmentation results with the predicted+GT boxes, run scripts_inference.sh with --save_seg
command. Finally, run the above plotting command.
Type the following to reproduce our other plots:
python plot/plot_teaser_histogram.py
python plot/plot_convergence_analysis.py
python plot/plot_lengthwise_analysis.py
python plot/plot_category_wise_stats.py
We thank the authors of the following awesome codebases:
Please also consider citing them.
We welcome contributions to the SeaBird repo. Feel free to raise a pull request.
SeaBird and BBAVectors code are under the MIT license. The PanopticBEV code is under the GPLv3 license for academic usage. For commercial usage of PanopticBEV, please contact Nikhil Gosala.
For questions, feel free to post here or drop an email to this address- [email protected]