NewBeeNLP
NewBeeNLP
Jun 12, 2024 · Artificial Intelligence

Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training

The article analyzes why the traditional cosine decay learning‑rate schedule hinders continued training of large language models and shows that fixed‑learning‑rate strategies such as Warmup‑Stable‑Decay, Cooldown, SWA, and Schedule‑Free Optimizer can match or surpass cosine performance while being more friendly to fine‑tuning.

CooldownLLM trainingSFO
0 likes · 7 min read
Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training