The reconstruction of large 3D scenes is essential for many applications, from autonomous driving and virtual reality to environmental monitoring and aerial surveying. 3D Gaussian Splatting (3D-GS) has proven to be a promising technique, but it presents challenges in terms of memory requirements and computational power. A new approach called Momentum-GS now promises to overcome these hurdles and significantly improve the quality of reconstruction.
3D-GS is characterized by high reconstruction quality and fast rendering speeds. However, the explicit representation of millions of Gaussian functions leads to high memory requirements. When reconstructing large scenes, a divide-and-conquer strategy is often used, where the scene is divided into blocks and these are processed in parallel. However, this can lead to inconsistencies at the block boundaries, such as visible transitions in lighting.
Hybrid representations that combine implicit and explicit features offer a way to mitigate these limitations. For example, they integrate dense voxel grids with sparse 3D Gaussian fields. However, the application of these hybrid representations in parallel reconstruction poses two challenges: Independent training of the blocks reduces data diversity and the quality of the reconstruction. Parallel training with a shared Gaussian decoder allows the merging of the trained models, but limits scalability, as the number of blocks is limited by the available GPUs.
Momentum-GS decouples the number of blocks from the GPU limitations. Periodically, k blocks are selected from a set of n blocks and distributed across k GPUs. To ensure consistency between the blocks, momentum-based self-distillation is used. A "teacher" Gaussian decoder, updated with momentum, provides global guidance to each block. This promotes collaborative learning and ensures that each block benefits from the context of the entire scene.
In addition, Momentum-GS uses reconstruction-driven block weighting. The weighting of each block is dynamically adjusted to its reconstruction quality. This allows the shared decoder to focus on weaker performing blocks, improving global consistency and preventing convergence to local minima.
To evaluate the effectiveness of Momentum-GS, extensive experiments were conducted on five large scenes. The results show that Momentum-GS achieves a significant improvement in reconstruction quality compared to existing techniques, such as CityGaussian, particularly a 12.8% increase in the LPIPS value while using significantly fewer blocks.
Momentum-GS offers decisive advantages for 3D scene reconstruction through its innovative approach:
Momentum-GS underscores the potential of hybrid representations for the reconstruction of large 3D scenes and opens up new possibilities for applications that rely on high-quality 3D models. The combination of momentum-based self-distillation and dynamic block weighting proves to be the key to overcoming the challenges in reconstructing complex and extensive scenes.