Jun 8, 2026 · Artificial Intelligence

Meta’s VLM³ Boosts Depth Accuracy to 0.9 Using Qwen3‑VL‑4B for Unified 3D Tasks

Meta and Princeton introduce VLM³, a unified vision‑language framework built on Qwen3‑VL‑4B that models depth estimation, object‑level 3D understanding, pixel matching and camera pose estimation without extra encoders, achieving up to 0.90 depth accuracy and outperforming larger specialist models on multiple benchmarks.

3D PerceptionDepth EstimationMulti-Task Learning

0 likes · 15 min read

Meta’s VLM³ Boosts Depth Accuracy to 0.9 Using Qwen3‑VL‑4B for Unified 3D Tasks