From Moonwalks to Cyber Cities: How WBench Maps the Limits of World Models
WBench, the first systematic multi‑turn benchmark for interactive video world models, evaluates 20 cutting‑edge models across navigation, actions, editing and view‑switching, revealing that no single model excels at all tasks, navigation is independent of visual quality, and multi‑turn interaction causes a 33‑point drop in performance.
