PaperAgent
PaperAgent
Feb 22, 2026 · Artificial Intelligence

How Skipping 50% of Gradient Updates Supercharges LLM Training (SkipUpdate & Magma)

A recent Google‑Northwestern study reveals that randomly discarding half of parameter updates during training—implemented as the SkipUpdate strategy—consistently outperforms dense optimizers across Llama models, and its extension Magma adds momentum‑gradient alignment to achieve further gains, offering a zero‑overhead, geometry‑aware regularization for large‑scale LLMs.

MagmaOptimizationSkipUpdate
0 likes · 9 min read
How Skipping 50% of Gradient Updates Supercharges LLM Training (SkipUpdate & Magma)
Ma Wei Says
Ma Wei Says
Mar 4, 2025 · Artificial Intelligence

Microsoft’s Open‑Source Multimodal AI Agent Model Magma: Capabilities and Innovations

On February 25 2025, Microsoft open‑sourced its first multimodal AI agent foundation model, Magma, which extends multimodal processing to images, video, and text, introduces Set‑of‑Mark and Trace‑of‑Mark techniques for spatial‑temporal reasoning, optimizes modular inference for edge devices, and integrates reinforcement learning for adaptive task execution.

Edge computingMagmaSet-of-Mark
0 likes · 6 min read
Microsoft’s Open‑Source Multimodal AI Agent Model Magma: Capabilities and Innovations