Data Party THU
Data Party THU
Apr 29, 2026 · Artificial Intelligence

How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team

The article analyzes the scaling limits of unsupervised reinforcement learning for large language models, revealing that intrinsic‑reward methods initially boost performance but inevitably collapse, proposes a unified theory and a model‑collapse metric to predict trainability, and argues that external‑reward approaches are the scalable path forward.

AI researchRL scalingexternal rewards
0 likes · 11 min read
How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team