NewBeeNLP
Jun 12, 2024 · Artificial Intelligence
Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training
The article analyzes why the traditional cosine decay learning‑rate schedule hinders continued training of large language models and shows that fixed‑learning‑rate strategies such as Warmup‑Stable‑Decay, Cooldown, SWA, and Schedule‑Free Optimizer can match or surpass cosine performance while being more friendly to fine‑tuning.
CooldownLLM trainingSFO
0 likes · 7 min read
