SuanNi
SuanNi
Apr 19, 2026 · Artificial Intelligence

Why Multimodal Video Models Still Miss the Mark: Inside the New Video‑MME‑v2 Benchmark

The Video‑MME‑v2 benchmark reveals that current multimodal video models, despite high leaderboard scores, struggle with genuine video understanding, thanks to a rigorous three‑layer evaluation, non‑linear scoring, and a meticulously curated 800‑video dataset that exposes their true intelligence limits.

AI evaluationVideo-MMElarge language models
0 likes · 10 min read
Why Multimodal Video Models Still Miss the Mark: Inside the New Video‑MME‑v2 Benchmark
Machine Heart
Machine Heart
Apr 13, 2026 · Artificial Intelligence

Why the Top Video Model Scores Only 49: Introducing Video‑MME‑v2 by Nanjing University

The new Video‑MME‑v2 benchmark reveals that despite saturated high scores on existing video‑understanding tests, the strongest commercial model (Gemini‑3‑Pro) reaches only 49.4 points versus a human expert’s 90.7, highlighting the benchmark’s layered ability system, group‑level non‑linear scoring, and the nuanced impact of "Thinking" features.

AI evaluationlarge modelsmultimodal benchmark
0 likes · 11 min read
Why the Top Video Model Scores Only 49: Introducing Video‑MME‑v2 by Nanjing University