Baobao Algorithm Notes
Oct 20, 2024 · Artificial Intelligence
Why Gradient Accumulation Isn’t Always Equivalent to Large‑Batch Training for LLMs
A recently discovered bug in popular LLM libraries shows that gradient accumulation can introduce significant accuracy loss compared to true large‑batch training, especially when sequence lengths vary, and the issue can be fixed by correcting the loss denominator scaling.
Deep LearningLLM traininggradient accumulation
0 likes · 6 min read
