Machine Heart
Jun 17, 2026 · Artificial Intelligence
Why RL‑Trained Agents Still Fail to Reason Actively: The Information Self‑Locking Problem
The paper reveals that outcome‑based reinforcement learning often traps LLM agents in an information self‑locking regime where weak action selection and belief tracking prevent proper credit assignment, and introduces AREW, a lightweight advantage‑reweighting method that restores active reasoning across multiple tasks and models.
AREWLLM AgentsReinforcement Learning
0 likes · 24 min read
