Paper | Project Page | Video
Official implementation of LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors
Hanyang Yu, Xiaoxiao Long and Ping Tan.
Abstract: We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable success in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting to input images, and a lack of detail. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable initialization of point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high- quality 360-degree scene reconstruction.
Our method takes unposed sparse images as inputs. For example, we select 8 images from the Horse Scene to cover a 360-degree view. Initially, we utilize a Background-Aware Depth-guided Initialization Module to generate dense point clouds and camera poses (see Section IV-B). These variables act as the initialization for the Gaussian kernels. Subsequently, in the Multi-modal Regularized Gaussian Reconstruction Module (see Section IV-C), we collectively optimize the Gaussian network through depth, normal, and virtual-view regularizations. After this stage, we train a Gaussian Repair model capable of enhancing Gaussian-rendered new view images. These improved images serve as guides for the training network, iteratively restoring Gaussian details (see Section IV-D). Finally, we employ a scene enhancement module to further enhance the rendered images for realistic visual effects (see Section IV-E).
- Support 2D-GS
- Support Scaffold-gs
- Add Increamental Test pose alignment module
- Support controlnet-tile-sdxl-1.0
LM-Gaussian is tested with CUDA 11.8.
-
Clone LM-Gaussian and download relevant models:
git clone https://github.com/hanyangyu1021/LMGaussian.git --recursive
-
Create the environment:
LM-Gaussian is tested on Python 3.10.12. Requirements are listed in requirements.txt. You can install them with
conda env create --file environment.yml
Put unposed sparse images in the './data/{dataset_name}/train/images/'
folder. Checkpoints can be found at:
Download the relevant checkpoints to
./Marigold/checkpoint/marigold-depth-lcm-v1-0/
and
./Marigold/checkpoint/marigold-normals-lcm-v0-1/
python Marigold/getmonodepthnormal.py -s data/horse16
Download the dust3r checkpoint "DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth"
and place it into
'./dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth'
.
Here we provide a simple example of horse scene in TNT. You can find the data in ./data
python dust3r/coarse_initialization.py -s data/horse16
python stage1_360.py -s data/horse16 --save outputs/horse16
To set up the model, download the following checkpoints to the ./models
folder:
Download clip-vit-large-patch14 model to ./openai
python train_repairmodel.py --exp_name outputs/controlnet_finetune/horse16 --prompt "any prompt describe the scene" --resolution 1 --gs_dir outputs/horse16 --data_dir data/horse16 --bg_white
python stage2_360.py -s data/horse16 --exp_name outputs/controlnet_finetune/horse16 --prompt "any prompt describe the scene" --bg_white --start_checkpoint "outputs/horse16/chkpnt12000.pth"
python stage2_forward.py -s data/barn3 --exp_name outputs/controlnet_finetune/barn3 --prompt "Houses, playground, outdoor" --bg_white --start_checkpoint "outputs/barn3/chkpnt6000.pth"
python render_interpolate.py -s data/horse16 --start_checkpoint outputs/horse16/chkpnt30000.pth
Checkpoints can be found at:
Download the checkpoints to
./models/zeroscope_v2_XL/
python scene_enhance.py --model_path ./models/zeroscope_v2_XL --input_path outputs/horse16/30000_render_video.mp4
This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!
If you find our work useful in your research, please consider giving a star :star: and citing the following paper :pencil:.
@misc{yu2024lmgaussianboostsparseview3d,
title={LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors},
author={Hanyang Yu and Xiaoxiao Long and Ping Tan},
year={2024},
eprint={2409.03456},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.03456},
}