How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting

SplatSSC introduces a depth‑guided Gaussian splatting framework that replaces random primitive initialization with geometry‑aware priors and a decoupled aggregation module, achieving state‑of‑the‑art performance on indoor semantic scene completion while dramatically reducing computational overhead and eliminating floaters.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting

Introduction

Semantic Scene Completion (SSC) predicts dense geometry and semantic labels of a scene from a single RGB image, crucial for embodied AI and autonomous driving. Traditional SSC pipelines use a large number of random 3D Gaussian primitives, leading to low utilization (~3.9%) and “floaters” – spurious semantic fragments in empty space.

Core Techniques

Depth‑Guided Primitive Initialization (GMF)

The Group‑wise Multi‑scale Fusion (GMF) module fuses multi‑scale image semantics with depth features from a pretrained Depth‑Anything‑V2 model. Channels are split into groups and processed with linear group cross‑attention (GCA), reducing attention complexity from quadratic to linear, which enables efficient mobile deployment.

Only 1,200 carefully placed Gaussian primitives (≈7% of prior work) are required to cover scene geometry.

Geometric priors derived from depth guide the placement, drastically reducing redundancy.

Decoupled Gaussian Aggregator (DGA)

DGA separates geometry occupancy prediction from semantic distribution. Opacity is treated as a confidence score; low occupancy probabilities act as gates that suppress erroneous semantic contributions from outlier primitives, effectively eliminating floaters without heuristic post‑processing.

Two parallel pathways predict geometry occupancy and conditional semantics.

Gate‑based suppression ensures clean scene boundaries.

Experiments

Main Results

On the indoor Occ‑ScanNet benchmark, SplatSSC achieves 62.83 % IoU and 51.83 % mIoU, surpassing the previous state‑of‑the‑art (RoboOcc) by 6.35 % IoU and 4.16 % mIoU. Qualitative results show higher object recall and sharper boundaries for fine structures such as chair legs and tabletops.

Ablation Studies

Key findings:

Primitive count: 1,200 primitives with scales in [0.01, 0.16] yield the highest mIoU (48.87 %). Increasing to 19,200 primitives degrades efficiency without improving accuracy.

Efficiency: On an RTX 3090, inference latency drops to ~115 ms and memory usage falls by ~9.6 % compared to baselines.

Component impact: Removing GMF dramatically harms geometry IoU, while omitting DGA re‑introduces floaters and reduces mIoU to 48 %.

Efficiency Breakthrough

Despite adding a lightweight depth branch and DGA, the overall parameter increase is only 0.19 %. The model reduces inference delay by ~9.3 % and GPU memory consumption by ~9.6 %, demonstrating that sparse, geometry‑guided representations can deliver high quality with modest resources.

Conclusion and Future Work

SplatSSC shows that quality‑driven Gaussian primitive placement and decoupled aggregation are more important than sheer quantity. Future work will extend the approach to large‑scale outdoor dynamic scenes and long‑term embodied perception tasks, positioning depth‑guided Gaussian splatting as a cornerstone for persistent world models.

Resources

Open‑source implementation:

https://github.com/Made-Gpt/SplatSSC
Comparison of initialization strategies
Comparison of initialization strategies
SplatSSC architecture overview
SplatSSC architecture overview
Group‑wise Multi‑scale Fusion (GMF) details
Group‑wise Multi‑scale Fusion (GMF) details
Decoupled Gaussian Aggregator (DGA) robustness
Decoupled Gaussian Aggregator (DGA) robustness
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI research3D perceptionGaussian splattingdecoupled aggregationdepth‑guided initializationsemantic scene completion
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.