Tagged articles

benchmark dataset

3 articles · Page 1 of 1

Feb 24, 2025 · Artificial Intelligence

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

The paper introduces AIDE, a hybrid AI‑generated image detector that fuses low‑level pixel statistics with high‑level semantic embeddings, and the manually curated Chameleon benchmark of ~26 000 diverse, high‑realism images, showing AIDE surpasses nine state‑of‑the‑art methods by up to 4.6 % while highlighting remaining challenges on this tougher dataset.

AI-generated image detectionbenchmark datasetcomputer vision

0 likes · 14 min read

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

Xiaohongshu Tech REDtech

Feb 17, 2025 · Artificial Intelligence

WorldSense: A New Benchmark for Evaluating Multimodal Large Models in Real‑World Scenarios

WorldSense, a new benchmark of 1,662 real‑world video‑audio clips and 3,172 QA pairs across 26 cognitive tasks, reveals that current multimodal large models achieve only 25%–48% accuracy, highlighting the crucial role of combined visual‑audio input and the difficulty of audio‑ and emotion‑related reasoning.

Multimodal AIbenchmark datasetlarge models

0 likes · 12 min read

WorldSense: A New Benchmark for Evaluating Multimodal Large Models in Real‑World Scenarios

NewBeeNLP

Jul 10, 2024 · Artificial Intelligence

Can Large Language Models Master Co‑Temporal Reasoning? Introducing COTEMPQA

This article presents the COTEMPQA benchmark for evaluating large language models on co‑temporal reasoning, details its four scenario types, construction pipeline, experimental results across models, error analysis, and proposes the MR‑COT strategy that leverages mathematical reasoning to significantly improve performance.

LLM evaluationMR-COTbenchmark dataset

0 likes · 11 min read

Can Large Language Models Master Co‑Temporal Reasoning? Introducing COTEMPQA