CrossVid: The New Benchmark Exposing AI’s Struggle with Cross‑Video Reasoning

CrossVid is an open‑source benchmark that evaluates multimodal large language models on cross‑video reasoning, offering 5,331 videos and 9,015 high‑quality QA pairs across four reasoning dimensions, and revealing that even the strongest models achieve only about 50% accuracy compared with human performance.

AI evaluationcross-video reasoningvideo understanding

0 likes · 9 min read

CrossVid: The New Benchmark Exposing AI’s Struggle with Cross‑Video Reasoning

Xiaohongshu Tech REDtech

Dec 4, 2025 · Artificial Intelligence

CrossVid: A New Benchmark Reveals the Limits of Multimodal LLMs in Cross‑Video Reasoning

CrossVid is an open‑source benchmark that evaluates multimodal large language models on cross‑video reasoning tasks, providing 5,331 videos, 9,015 QA pairs, four high‑level dimensions and ten specific tasks, and exposing significant performance gaps between current models and humans.

AI evaluationcross-video reasoningmultimodal LLM

0 likes · 9 min read

CrossVid: A New Benchmark Reveals the Limits of Multimodal LLMs in Cross‑Video Reasoning