Baobao Algorithm Notes
Nov 20, 2025 · Artificial Intelligence
Why Reinforcement Learning Preserves LLM Generality Better Than Supervised Fine‑Tuning
The article analyzes why reinforcement learning (RL) fine‑tuning retains a large language model's general abilities better than supervised fine‑tuning (SFT), explaining the off‑policy distribution shift of SFT and the on‑policy data consistency, KL penalty, and trust‑region mechanisms that give RL its anti‑forgetting properties.
Catastrophic ForgettingLLMOn-Policy Data
0 likes · 8 min read
