Daoyi Gao, Dávid Rozenberszki, Stefan Leutenegger, and Angela Dai
DiffCAD proposed a weakly-supervised approach for CAD model retrieval and alignment from an RGB image. Our approach utilzes disentangled diffusion models to tackle the ambiguities in the monocular perception, and achives robuts cross-domain performance while only trained on synthetic dataset.
We tested with Ubuntu 20.04, Python 3.8, CUDA 11, Pytorch 2.0
We provide an Anaconda environment with the dependencies, to install run
conda env create -f env.yaml
We provide our synthetic 3D-FRONT data rendering (RGB, rendered/predicted depth, mask, camera poses); processed watertight (mesh-fusion) and canonicalized meshes (ShapeNet and 3D-FUTURE), and their encoded latent vectors; machine estimated depth and masks on the validation set of ScanNet25k data. However, since the rendered data will take up large storage space, we also encourage you to generate the synthetic data rendering yourself following BlenderProc or 3DFront-Rendering.
Source Dataset | Description |
---|---|
3D-FRONT-CONFIG | Scene config for rendering, we also augment it with ShapeNet objects. |
3D-FRONT-RENDERING | Renderings of 3D-FRONT dataset for each target category. |
Object Meshes | Canonicalized and watertighted mesh of ShapeNet and 3D-FUTURE. |
Object Meshes - AUG | ShapeNet object but scaled by its NN 3DF object scale, which we use to augment the synthetic dataset. |
Object Latents | Encoded object latents for retrieval. |
Val ScanNet25k | Predict depth, GT and predicted masks, CAD pools, pose gts on the validation set. |
ScanNet25k data | The processed data from ROCA |
We also provide the checkpoints for scene scale, object pose, and shape diffusion models.
Source Dataset | |
---|---|
Scale | Joint category ldm model |
Pose | Category-specific ldm model |
Shape | Category-specific ldm model |
For scene scale:
python train_scale.py --base=configs/scale/depth_feat.yaml -t --gpus=0, --logdir=logs
For object NOCs:
python train_pose.py --base=configs/pose/depth_gcn.yaml -t --gpus=0, --logdir=logs
For object latents:
python train_shape.py --base=configs/shape/nocs_embed.yaml -t --gpus=0, --logdir=logs
For scene scale sampling:
python scripts/generate_multi_scale_candidates.py
For object NOCs generation:
python scripts/generate_multi_nocs_candidates.py
For object latent sampling:
python scripts/generate_multi_shape_candidates.py
@article{gao2023diffcad,
title= {DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image},
author={Gao, Daoyi and Rozenberszki, David and Leutenegger, Stefan and Dai, Angela},
booktitle={ArXiv Preprint},
year={2023}
}
We borrow latent-diffusion from the official implementation.