Tagged articles

multimodal benchmark

3 articles · Page 1 of 1

Apr 19, 2026 · Artificial Intelligence

Why Multimodal Video Models Still Miss the Mark: Inside the New Video‑MME‑v2 Benchmark

The Video‑MME‑v2 benchmark reveals that current multimodal video models, despite high leaderboard scores, struggle with genuine video understanding, thanks to a rigorous three‑layer evaluation, non‑linear scoring, and a meticulously curated 800‑video dataset that exposes their true intelligence limits.

AI evaluationVideo UnderstandingVideo-MME

0 likes · 10 min read

Why Multimodal Video Models Still Miss the Mark: Inside the New Video‑MME‑v2 Benchmark

Machine Heart

Apr 13, 2026 · Artificial Intelligence

Why the Top Video Model Scores Only 49: Introducing Video‑MME‑v2 by Nanjing University

The new Video‑MME‑v2 benchmark reveals that despite saturated high scores on existing video‑understanding tests, the strongest commercial model (Gemini‑3‑Pro) reaches only 49.4 points versus a human expert’s 90.7, highlighting the benchmark’s layered ability system, group‑level non‑linear scoring, and the nuanced impact of "Thinking" features.

AI evaluationLarge ModelsVideo Understanding

0 likes · 11 min read

Why the Top Video Model Scores Only 49: Introducing Video‑MME‑v2 by Nanjing University

HyperAI Super Neural

Oct 14, 2025 · Artificial Intelligence

NeurIPS 2025: OCRBench v2 Shows Gemini Leads Chinese OCR Ranking Yet Scores Only Pass

OCRBench v2, introduced at NeurIPS 2025, evaluates 58 multimodal models on 23 OCR‑related tasks in Chinese and English, revealing that even top models like Gemini‑2.5‑Pro barely exceed the passing threshold and that most models struggle with fine‑grained text localization and multilingual performance.

EvaluationGeminiNeurIPS 2025

0 likes · 8 min read

NeurIPS 2025: OCRBench v2 Shows Gemini Leads Chinese OCR Ranking Yet Scores Only Pass