Machine Learning Algorithms & Natural Language Processing
Apr 4, 2026 · Artificial Intelligence
Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start
The paper reveals that over‑optimizing supervised fine‑tuning (SFT) for large language models can diminish their reinforcement‑learning (RL) potential, proposes an Adaptive Early‑Stop Loss (AESL) that balances accuracy and output diversity during cold‑start, and demonstrates across multiple LLMs that AESL consistently yields superior RL results.
AI trainingAdaptive Early‑Stop LossLLM
0 likes · 11 min read
