Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 21, 2026 · Artificial Intelligence

Zero‑Overhead Magma Beats Adam and Muon by Dropping Half the Gradients – 19% Perplexity Reduction on 1B‑Scale Models

Magma, a new momentum‑aligned gradient‑masking optimizer from Northwestern University and Google, discards half of the parameter updates at zero extra cost, achieving up to 19% lower perplexity than Adam and 9% lower than Muon on 1‑billion‑parameter models while providing theoretical guarantees and extensive empirical validation across heterogeneous loss landscapes.

Magma optimizeradaptive optimizationgradient masking
0 likes · 11 min read
Zero‑Overhead Magma Beats Adam and Muon by Dropping Half the Gradients – 19% Perplexity Reduction on 1B‑Scale Models