Tag

multimodal evaluation

1 views collected around this technical thread.

Sohu Tech Products
Sohu Tech Products
Jul 31, 2024 · Artificial Intelligence

MMEvalPro: A Trustworthy Benchmark for Evaluating Multimodal Large Models

MMEvalPro, a new benchmark created by researchers from Peking University, Chinese Academy of Medical Sciences, CUHK and Alibaba, augments existing multimodal datasets with perception and knowledge questions and introduces a Genuine Accuracy metric, revealing that top multimodal models still lag far behind humans and exposing shortcut‑driven performance on prior tests.

MMEvalProbenchmarklarge language models
0 likes · 11 min read
MMEvalPro: A Trustworthy Benchmark for Evaluating Multimodal Large Models