Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 4, 2026 · Artificial Intelligence

Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start

The paper reveals that over‑optimizing supervised fine‑tuning (SFT) for large language models can diminish their reinforcement‑learning (RL) potential, proposes an Adaptive Early‑Stop Loss (AESL) that balances accuracy and output diversity during cold‑start, and demonstrates across multiple LLMs that AESL consistently yields superior RL results.

AI trainingAdaptive Early‑Stop LossLLM
0 likes · 11 min read
Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 22, 2025 · Artificial Intelligence

Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results

DeepSeek‑R1’s open‑source series demonstrates that reinforcement‑learning‑only training can match top‑tier models like OpenAI‑o1, while a small amount of SFT further improves readability; the article dissects its technical report, training pipeline, reward design, distillation strategy, benchmark outcomes, and remaining challenges.

DeepSeekLarge Language ModelSupervised Fine‑Tuning
0 likes · 11 min read
Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results