Skip to content

hanyangyu1021/LMGaussian

Repository files navigation

LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

Paper | Project Page | Video

Official implementation of LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

Hanyang Yu, Xiaoxiao Long and Ping Tan.

Abstract: We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable success in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting to input images, and a lack of detail. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable initialization of point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high- quality 360-degree scene reconstruction.

Method

Our method takes unposed sparse images as inputs. For example, we select 8 images from the Horse Scene to cover a 360-degree view. Initially, we utilize a Background-Aware Depth-guided Initialization Module to generate dense point clouds and camera poses (see Section IV-B). These variables act as the initialization for the Gaussian kernels. Subsequently, in the Multi-modal Regularized Gaussian Reconstruction Module (see Section IV-C), we collectively optimize the Gaussian network through depth, normal, and virtual-view regularizations. After this stage, we train a Gaussian Repair model capable of enhancing Gaussian-rendered new view images. These improved images serve as guides for the training network, iteratively restoring Gaussian details (see Section IV-D). Finally, we employ a scene enhancement module to further enhance the rendered images for realistic visual effects (see Section IV-E).

TODO List

  • Support 2D-GS
  • Support Scaffold-gs
  • Add Increamental Test pose alignment module
  • Support controlnet-tile-sdxl-1.0

🚀 Setup

CUDA

LM-Gaussian is tested with CUDA 11.8.

Cloning the Repository

  1. Clone LM-Gaussian and download relevant models:
    git clone https://github.com/hanyangyu1021/LMGaussian.git --recursive
  2. Create the environment: LM-Gaussian is tested on Python 3.10.12. Requirements are listed in requirements.txt. You can install them with
    conda env create --file environment.yml
        

Get Monocular Depth/Normal Maps

Put unposed sparse images in the './data/{dataset_name}/train/images/' folder. Checkpoints can be found at:

Download the relevant checkpoints to ./Marigold/checkpoint/marigold-depth-lcm-v1-0/ and ./Marigold/checkpoint/marigold-normals-lcm-v0-1/

python Marigold/getmonodepthnormal.py -s data/horse16

Dense Initialization

Download the dust3r checkpoint "DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth" and place it into './dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth'.

DUSt3R Checkpoint

Here we provide a simple example of horse scene in TNT. You can find the data in ./data
python dust3r/coarse_initialization.py -s data/horse16

Multi-modal Regularized Reconstruction

python stage1_360.py -s data/horse16 --save outputs/horse16

Train Repair model

To set up the model, download the following checkpoints to the ./models folder:

Download clip-vit-large-patch14 model to ./openai

python train_repairmodel.py   --exp_name outputs/controlnet_finetune/horse16 --prompt "any prompt describe the scene" --resolution 1 --gs_dir outputs/horse16 --data_dir data/horse16   --bg_white 

Iterative refinement

python stage2_360.py  -s data/horse16  --exp_name outputs/controlnet_finetune/horse16 --prompt "any prompt describe the scene" --bg_white  --start_checkpoint "outputs/horse16/chkpnt12000.pth"
python stage2_forward.py  -s data/barn3 --exp_name outputs/controlnet_finetune/barn3 --prompt "Houses, playground, outdoor" --bg_white  --start_checkpoint "outputs/barn3/chkpnt6000.pth"

Render video & Scene enhancement

python render_interpolate.py -s data/horse16 --start_checkpoint outputs/horse16/chkpnt30000.pth

Checkpoints can be found at:

Download the checkpoints to ./models/zeroscope_v2_XL/

python scene_enhance.py --model_path ./models/zeroscope_v2_XL --input_path outputs/horse16/30000_render_video.mp4

🤗Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

🌏Citation

If you find our work useful in your research, please consider giving a star :star: and citing the following paper :pencil:.

@misc{yu2024lmgaussianboostsparseview3d,
      title={LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors}, 
      author={Hanyang Yu and Xiaoxiao Long and Ping Tan},
      year={2024},
      eprint={2409.03456},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.03456}, 
}