momentum alignment — 1 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Feb 21, 2026 · Artificial Intelligence

Zero‑Overhead Magma Beats Adam and Muon by Dropping Half the Gradients – 19% Perplexity Reduction on 1B‑Scale Models

Magma, a new momentum‑aligned gradient‑masking optimizer from Northwestern University and Google, discards half of the parameter updates at zero extra cost, achieving up to 19% lower perplexity than Adam and 9% lower than Muon on 1‑billion‑parameter models while providing theoretical guarantees and extensive empirical validation across heterogeneous loss landscapes.

Magma optimizeradaptive optimizationgradient masking

0 likes · 11 min read

Zero‑Overhead Magma Beats Adam and Muon by Dropping Half the Gradients – 19% Perplexity Reduction on 1B‑Scale Models