End-to-End 3D Spatial Video Generation via Monocular Depth Estimation, Novel View Synthesis, and MV‑HEVC Encoding
This article presents a comprehensive AI‑driven pipeline that converts 2D video into immersive 3D spatial video by leveraging monocular depth estimation, depth‑warping novel view synthesis, a multi‑branch inpainting module, a large‑scale StereoV1K dataset, and efficient MV‑HEVC compression, with results validated at ICME 2025 and deployed in JD Vision services.