Yingda Yin1,2*, Yuzheng Liu2,3* Yang Xiao4*, Daniel Cohen-Or5, Jingwei Huang6, Baoquan Chen2,3
1School of Computer Science, Peking University 2National Key Lab of General AI, China 3School of Intelligence Science and Technology, Peking University 4Ecole des Ponts ParisTech 5Tel-Aviv University 6Tencent
CVPR 2024
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach that synergistically leverages geometric priors and semantic cues derived from Segment Anything Model (SAM).
Our approach combines geometric priors with the capabilities of 2D foundation models. We over-segment 3D point clouds into superpoints (top-left), and generate 2D image masks using SAM (bottom-left). We then construct a scene graph that quantifies the pairwise affinity scores of super points (middle). Finally, we leverage a progressive region growing to gradually merge 3D superpoints into the final 3D instance segmentation masks (right).
Prepare environment
conda create -n sai3d python=3.8
conda activate sai3d
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install open3d natsort matplotlib tqdm opencv-python scipy plyfile
Install Semantic-SAM
git clone https://github.com/UX-Decoder/Semantic-SAM.git Semantic-SAM --recursive
#if you encounter any problem about cuda version, try using cuda11.8 with the following command
#conda install nvidia/label/cuda-11.8.0::cuda
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
cd Semantic_SAM
python -m pip install -r requirements.txt
cd semantic_sam/body/encoder/ops
sh ./make.sh
cd - && mkdir checkpoints && cd checkpoints
wget https://github.com/UX-Decoder/Semantic-SAM/releases/download/checkpoint/swinl_only_sam_many2many.pth
Install OpenMask3D(if need semantic)
git clone https://github.com/OpenMask3D/openmask3d.git openmask3d --recursive
cd openmask3d
conda create --name=openmask3d python=3.8.5 # create new virtual environment
conda activate openmask3d # activate it
bash install_requirements.sh # install requirements
pip install -e . # install current repository in editable mode
mkdir checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth #download SAM ckpt
Download ScanNetV2 / ScanNet200 and organize the dataset as follows:
data
├── ScanNet
│ ├── posed_images
│ | ├── scene0000_00
│ | │ ├──intrinsic_color.txt
│ | │ ├──intrinsic_depth.txt
│ | │ ├──0000.jpg //rgb image
│ | │ ├──0000.png //depth image
│ | │ ├──0000.txt //extrinsic
│ | │ └── ...
│ | └── ...
│ ├── scans
│ | ├── scene0000_00
│ | └── ...
│ ├── Tasks
│ | ├── Benchmark
│ | │ ├──scannetv2_val.txt
│ | │ ├──scannetv2_train.txt
│ | │ └── ...
-
Obtain 2D SAM results
Change the config here to false, and set the required parameter in this script then run:
bash ./scripts/sam_scannet.sh
The results will be stored at
data/ScanNet/2D_masks
, where the 2D segmentation results and visualization of 2D masks will be named asmaskraw_<frame_number>.png
andmaskcolor_<frame_number>.png
respectively. -
Obtain 3D superpoints For ScanNet dataset, superpoints are already provided in
scans/<scene_id>/<scene_id>_vh_clean_2.0.010000.segs.json
To generate superpoint on mesh of other dataset, we also use the mesh segmentator provided by ScanNet directly. Please check here to see the usage.
-
3D instance segmentation by region growing
Set the required parameter in this script, then run SAI3D by using the following command:
bash scripts/seg_scannet.sh
The resulting class-agnostic masks will be exported into the format for ScanNet instance segmentation benchmark.
Now you can implement class-agnostic evaluation directly on the results we got, which focuses only on the accuracy of the instance masks without considering any semantic label
We modify the original ScanNet instance segmentation benchmark to conduct it. We collect all 18 classes(excluding wall and floor) of gt masks in ScanNet-v2 dataset as our gt class-agnostic masks, and the AP score is reported over all of the foreground masks.
We provide processed gt class-agnostic masks here. Please download and extract it into your GT_DIR
- Prepare environment for ScanNet benchmark
conda create -n eval python=2.7 conda activate eval cd evaluation pip install -r requirements.txt
- Start evaluation
python evauation/evaluate_class_agnostic_instance.py \ --pred_path=PREDICTION_DIR \ --gt_path=GT_DIR
The numerical results will be saved under the directory of your predictions by default.
Since the segmentation results in ScanNet evaluation format are tough to visualize, we provide functions in helpers/visualize.py to transform them into mesh(.ply) for visualization. Please check it to see the usage.
We prove that our proposed class-agnostic masks are more accurate and can be adopted in tasks like semantic instance segmentation. Here we choose OpenMask3D to assign semantic label for our class-agnostic masks.
-
Reorganize scannet dataset
Since OpenMask3D requires ScanNet dataset to be organized like this, we provide a script to reorganize the dataset with softlink.
python helpers/format_convertion.py \ --app=0 \ --base_dir=PATH_TO_PREVIOUS_SCANNET_DATASET \ --out_dir=PATH_TO_REORGANIZED_SCANNET_DATASET
For example,
python helpers/format_convertion.py \ --app=0 \ --base_dir="data/ScanNet" \ --out_dir="data/ScanNet_OpenMask3D"
According to the convention of OpenMask3D, color and depth image of your data should share the same resolution. If not, please replace this line in OpenMask3D with the following codes to adjust the resolution of color image to the same as depth image's when loading them in OpenMask3D:
img = Image.open(img_path).convert("RGB").resize(DEPTH_RESOLUTION,Image.BILINEAR) images.append(img)
-
Prepare class-agnostic masks
We've already got class-agnosic predictions from the previous section, and exported them into evaluation format for ScanNet benchmark.
However, OpenMask3D requires class-agnostic masks to be saved in a
.pt
format before assigning semantic for them. So please run the following command to convert the previous format of class-agnostic predictions into the input format required by OpenMask3D.python helpers/format_convertion.py \ --app=1 \ --base_dir=PATH_TO_PREDICTION_DIR \ --out_dir=PATH_TO_SAVE_PREDICTION_OF_NEW_FORMAT
For example,
RESULT_NAME="demo_scannet_5view_merge200_2-norm_semantic-sam_connect(0.9,0.5,5)_depth2" python helpers/format_convertion.py \ --app=1 \ --base_dir="data/ScanNet/results/${RESULT_NAME}" \ --out_dir="data/class_agnostic_masks"
-
Assign semantic and evaluate
We provide processed gt masks for ScanNet200 semantic instance segmentation here.
Now you can compute the per-mask scene features and run the evaluation of OpenMask3D on validation split of ScanNet200 dataset. Change the intrinsic_resolution parameter in OpenMask3D configuration with the resolution of your
intrinsic_color.txt
. Then set the required parameter in this script and run the following command:bash scripts/run_openmask3d_scannet200.sh
This script first computes the mask features associated with each class-agnostic mask, and then query masks with 200 class names in ScanNet200 to assign semantic label for them. Afterwards, the evaluation script automatically runs in order to obtain 3D closed-vocabulary semantic instance segmentation scores.