Machine Learning Algorithms & Natural Language Processing
May 19, 2026 · Artificial Intelligence
From P(y|x) to P(y): Reinforcement Learning in Pre‑train Space Unlocks Endogenous Reasoning
The paper introduces PreRL, which removes the input condition to directly optimize the reasoning trajectory (P(y)) of large language models, and combines it with standard RL in Dual Space RL (DSRL), achieving consistent gains on math and out‑of‑distribution benchmarks, faster training, and richer reasoning behaviors.
DSRLPreRLlarge language models
0 likes · 11 min read
