Tagged articles
1 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 19, 2026 · Artificial Intelligence

From P(y|x) to P(y): Reinforcement Learning in Pre‑train Space Unlocks Endogenous Reasoning

The paper introduces PreRL, which removes the input condition to directly optimize the reasoning trajectory (P(y)) of large language models, and combines it with standard RL in Dual Space RL (DSRL), achieving consistent gains on math and out‑of‑distribution benchmarks, faster training, and richer reasoning behaviors.

DSRLPreRLlarge language models
0 likes · 11 min read
From P(y|x) to P(y): Reinforcement Learning in Pre‑train Space Unlocks Endogenous Reasoning