Apr 29, 2026 · Artificial Intelligence

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding

VEGA-3D extracts the hidden 3D priors embedded in large video generation models, fuses them with semantic features via token‑level adaptive gating, and demonstrates dramatically higher multi‑view consistency and state‑of‑the‑art results on 3D scene‑understanding benchmarks such as ScanRefer, ScanQA, VSI‑Bench and LIBERO—all without any additional 3D annotations.

Embodied AIScene UnderstandingVEGA-3D

0 likes · 10 min read

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding

multi-view consistency

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding