Jul 27, 2025 · Artificial Intelligence

Can LLMs Ask the Right Questions? Introducing AR‑Bench for Active Reasoning

Large Language Models excel at passive reasoning, but struggle when information is incomplete; this paper defines the active reasoning problem, presents the AR‑Bench benchmark with detective, puzzle, and number‑guessing tasks, and reveals through extensive experiments that even top models like GPT‑4o perform poorly, highlighting research gaps.

Active ReasoningLLM evaluationbenchmark

0 likes · 13 min read

Can LLMs Ask the Right Questions? Introducing AR‑Bench for Active Reasoning