Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 14, 2025 · Artificial Intelligence

Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It

The article analyzes the poor generalization of supervised fine‑tuning (SFT) for large language models, reveals its gradient as a high‑variance inverse‑probability policy gradient, proposes a one‑line Dynamic Fine‑Tuning correction, and shows substantial gains on challenging math and offline RL benchmarks.

Dynamic Fine-TuningGeneralizationLLM alignment
0 likes · 7 min read
Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It