Tagged articles
3 articles
Page 1 of 1
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 24, 2025 · Artificial Intelligence

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

The paper introduces AIDE, a hybrid AI‑generated image detector that fuses low‑level pixel statistics with high‑level semantic embeddings, and the manually curated Chameleon benchmark of ~26 000 diverse, high‑realism images, showing AIDE surpasses nine state‑of‑the‑art methods by up to 4.6 % while highlighting remaining challenges on this tougher dataset.

AI-generated image detectionComputer VisionDeep Learning
0 likes · 14 min read
AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 17, 2025 · Artificial Intelligence

WorldSense: A New Benchmark for Evaluating Multimodal Large Models in Real‑World Scenarios

WorldSense, a new benchmark of 1,662 real‑world video‑audio clips and 3,172 QA pairs across 26 cognitive tasks, reveals that current multimodal large models achieve only 25%–48% accuracy, highlighting the crucial role of combined visual‑audio input and the difficulty of audio‑ and emotion‑related reasoning.

Multimodal AIbenchmark datasetlarge models
0 likes · 12 min read
WorldSense: A New Benchmark for Evaluating Multimodal Large Models in Real‑World Scenarios
NewBeeNLP
NewBeeNLP
Jul 10, 2024 · Artificial Intelligence

Can Large Language Models Master Co‑Temporal Reasoning? Introducing COTEMPQA

This article presents the COTEMPQA benchmark for evaluating large language models on co‑temporal reasoning, details its four scenario types, construction pipeline, experimental results across models, error analysis, and proposes the MR‑COT strategy that leverages mathematical reasoning to significantly improve performance.

LLM evaluationMR-COTbenchmark dataset
0 likes · 11 min read
Can Large Language Models Master Co‑Temporal Reasoning? Introducing COTEMPQA