AI Frontier Lectures
AI Frontier Lectures
Dec 9, 2025 · Artificial Intelligence

CrossVid: The New Benchmark Exposing AI’s Struggle with Cross‑Video Reasoning

CrossVid is an open‑source benchmark that evaluates multimodal large language models on cross‑video reasoning, offering 5,331 videos and 9,015 high‑quality QA pairs across four reasoning dimensions, and revealing that even the strongest models achieve only about 50% accuracy compared with human performance.

AI evaluationcross-video reasoningvideo understanding
0 likes · 9 min read
CrossVid: The New Benchmark Exposing AI’s Struggle with Cross‑Video Reasoning
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 4, 2025 · Artificial Intelligence

CrossVid: A New Benchmark Reveals the Limits of Multimodal LLMs in Cross‑Video Reasoning

CrossVid is an open‑source benchmark that evaluates multimodal large language models on cross‑video reasoning tasks, providing 5,331 videos, 9,015 QA pairs, four high‑level dimensions and ten specific tasks, and exposing significant performance gaps between current models and humans.

AI evaluationcross-video reasoningmultimodal LLM
0 likes · 9 min read
CrossVid: A New Benchmark Reveals the Limits of Multimodal LLMs in Cross‑Video Reasoning