Jun 12, 2024 · Artificial Intelligence

Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training

The article analyzes why the traditional cosine decay learning‑rate schedule hinders continued training of large language models and shows that fixed‑learning‑rate strategies such as Warmup‑Stable‑Decay, Cooldown, SWA, and Schedule‑Free Optimizer can match or surpass cosine performance while being more friendly to fine‑tuning.

CooldownLLM trainingSFO

0 likes · 7 min read

Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training