Data Party THU
Apr 29, 2026 · Artificial Intelligence
How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team
The article analyzes the scaling limits of unsupervised reinforcement learning for large language models, revealing that intrinsic‑reward methods initially boost performance but inevitably collapse, proposes a unified theory and a model‑collapse metric to predict trainability, and argues that external‑reward approaches are the scalable path forward.
AI researchRL scalingexternal rewards
0 likes · 11 min read
