unsupervised RL — 1 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Mar 21, 2026 · Artificial Intelligence

Unsupervised RL for Large Models: How Far Can It Scale? Tsinghua’s Systematic Study

The paper analyzes unsupervised reinforcement learning for large language models, revealing that intrinsic reward methods initially boost performance but inevitably collapse due to confidence‑correctness misalignment, proposes a model‑collapse step metric to predict RL suitability, and argues that external, verification‑based rewards are the scalable path forward.

Large language modelsexternal verification rewardintrinsic reward

0 likes · 12 min read

Unsupervised RL for Large Models: How Far Can It Scale? Tsinghua’s Systematic Study