Data Party THU
Data Party THU
Oct 8, 2025 · Artificial Intelligence

Why Reinforcement Learning Unlocks Hierarchical Reasoning in LLMs: The HICRA Breakthrough

The article explains how reinforcement learning induces a hierarchical learning dynamic in large language models, introduces the HICRA training paradigm that concentrates gradient updates on planning tokens, and shows through extensive text and multimodal benchmarks that this approach consistently yields earlier Aha moments and superior reasoning performance.

HICRAHierarchical ReasoningSemantic Entropy
0 likes · 10 min read
Why Reinforcement Learning Unlocks Hierarchical Reasoning in LLMs: The HICRA Breakthrough