How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting
SplatSSC introduces a depth‑guided Gaussian splatting framework that replaces random primitive initialization with geometry‑aware priors and a decoupled aggregation module, achieving state‑of‑the‑art performance on indoor semantic scene completion while dramatically reducing computational overhead and eliminating floaters.
Introduction
Semantic Scene Completion (SSC) predicts dense geometry and semantic labels of a scene from a single RGB image, crucial for embodied AI and autonomous driving. Traditional SSC pipelines use a large number of random 3D Gaussian primitives, leading to low utilization (~3.9%) and “floaters” – spurious semantic fragments in empty space.
Core Techniques
Depth‑Guided Primitive Initialization (GMF)
The Group‑wise Multi‑scale Fusion (GMF) module fuses multi‑scale image semantics with depth features from a pretrained Depth‑Anything‑V2 model. Channels are split into groups and processed with linear group cross‑attention (GCA), reducing attention complexity from quadratic to linear, which enables efficient mobile deployment.
Only 1,200 carefully placed Gaussian primitives (≈7% of prior work) are required to cover scene geometry.
Geometric priors derived from depth guide the placement, drastically reducing redundancy.
Decoupled Gaussian Aggregator (DGA)
DGA separates geometry occupancy prediction from semantic distribution. Opacity is treated as a confidence score; low occupancy probabilities act as gates that suppress erroneous semantic contributions from outlier primitives, effectively eliminating floaters without heuristic post‑processing.
Two parallel pathways predict geometry occupancy and conditional semantics.
Gate‑based suppression ensures clean scene boundaries.
Experiments
Main Results
On the indoor Occ‑ScanNet benchmark, SplatSSC achieves 62.83 % IoU and 51.83 % mIoU, surpassing the previous state‑of‑the‑art (RoboOcc) by 6.35 % IoU and 4.16 % mIoU. Qualitative results show higher object recall and sharper boundaries for fine structures such as chair legs and tabletops.
Ablation Studies
Key findings:
Primitive count: 1,200 primitives with scales in [0.01, 0.16] yield the highest mIoU (48.87 %). Increasing to 19,200 primitives degrades efficiency without improving accuracy.
Efficiency: On an RTX 3090, inference latency drops to ~115 ms and memory usage falls by ~9.6 % compared to baselines.
Component impact: Removing GMF dramatically harms geometry IoU, while omitting DGA re‑introduces floaters and reduces mIoU to 48 %.
Efficiency Breakthrough
Despite adding a lightweight depth branch and DGA, the overall parameter increase is only 0.19 %. The model reduces inference delay by ~9.3 % and GPU memory consumption by ~9.6 %, demonstrating that sparse, geometry‑guided representations can deliver high quality with modest resources.
Conclusion and Future Work
SplatSSC shows that quality‑driven Gaussian primitive placement and decoupled aggregation are more important than sheer quantity. Future work will extend the approach to large‑scale outdoor dynamic scenes and long‑term embodied perception tasks, positioning depth‑guided Gaussian splatting as a cornerstone for persistent world models.
Resources
Open‑source implementation:
https://github.com/Made-Gpt/SplatSSCSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
