PaperAgent
Feb 22, 2026 · Artificial Intelligence
How Skipping 50% of Gradient Updates Supercharges LLM Training (SkipUpdate & Magma)
A recent Google‑Northwestern study reveals that randomly discarding half of parameter updates during training—implemented as the SkipUpdate strategy—consistently outperforms dense optimizers across Llama models, and its extension Magma adds momentum‑gradient alignment to achieve further gains, offering a zero‑overhead, geometry‑aware regularization for large‑scale LLMs.
MagmaOptimizationSkipUpdate
0 likes · 9 min read
