Momentum-GS Improves Large-Scale 3D Scene Reconstruction

Large-Scale 3D Scene Reconstruction: Momentum-GS Sets New Standards

The reconstruction of large 3D scenes is essential for many applications, from autonomous driving and virtual reality to environmental monitoring and aerial surveying. 3D Gaussian Splatting (3D-GS) has proven to be a promising technique, but it presents challenges in terms of memory requirements and computational power. A new approach called Momentum-GS now promises to overcome these hurdles and significantly improve the quality of reconstruction.

Challenges of 3D Scene Reconstruction

3D-GS is characterized by high reconstruction quality and fast rendering speeds. However, the explicit representation of millions of Gaussian functions leads to high memory requirements. When reconstructing large scenes, a divide-and-conquer strategy is often used, where the scene is divided into blocks and these are processed in parallel. However, this can lead to inconsistencies at the block boundaries, such as visible transitions in lighting.

Hybrid representations that combine implicit and explicit features offer a way to mitigate these limitations. For example, they integrate dense voxel grids with sparse 3D Gaussian fields. However, the application of these hybrid representations in parallel reconstruction poses two challenges: Independent training of the blocks reduces data diversity and the quality of the reconstruction. Parallel training with a shared Gaussian decoder allows the merging of the trained models, but limits scalability, as the number of blocks is limited by the available GPUs.

Momentum-GS: An Innovative Approach

Momentum-GS decouples the number of blocks from the GPU limitations. Periodically, k blocks are selected from a set of n blocks and distributed across k GPUs. To ensure consistency between the blocks, momentum-based self-distillation is used. A "teacher" Gaussian decoder, updated with momentum, provides global guidance to each block. This promotes collaborative learning and ensures that each block benefits from the context of the entire scene.

In addition, Momentum-GS uses reconstruction-driven block weighting. The weighting of each block is dynamically adjusted to its reconstruction quality. This allows the shared decoder to focus on weaker performing blocks, improving global consistency and preventing convergence to local minima.

Experimental Results

To evaluate the effectiveness of Momentum-GS, extensive experiments were conducted on five large scenes. The results show that Momentum-GS achieves a significant improvement in reconstruction quality compared to existing techniques, such as CityGaussian, particularly a 12.8% increase in the LPIPS value while using significantly fewer blocks.

Summary of the Key Advantages of Momentum-GS

Momentum-GS offers decisive advantages for 3D scene reconstruction through its innovative approach:

Scalable parallel training by decoupling the number of blocks from the number of GPUs.
Improved consistency between blocks through momentum-based self-distillation.
Focused improvement of weaker performing blocks through reconstruction-driven block weighting.
Significant increase in reconstruction quality compared to state-of-the-art methods.