AI Frontier Lectures
Jul 27, 2025 · Artificial Intelligence
Can LLMs Ask the Right Questions? Introducing AR‑Bench for Active Reasoning
Large Language Models excel at passive reasoning, but struggle when information is incomplete; this paper defines the active reasoning problem, presents the AR‑Bench benchmark with detective, puzzle, and number‑guessing tasks, and reveals through extensive experiments that even top models like GPT‑4o perform poorly, highlighting research gaps.
Active ReasoningLLM evaluationbenchmark
0 likes · 13 min read
