Tagged articles

AdaGrad

2 articles · Page 1 of 1

Aug 2, 2025 · Artificial Intelligence

Deep Learning Optimizers Demystified: Momentum, AdaGrad, RMSProp & Adam Explained

This article breaks down the core deep‑learning optimizers—gradient descent, Momentum, AdaGrad, RMSProp and Adam—showing why vanilla gradient descent converges slowly, how each method uses exponential moving averages to accelerate training, and why Adam is generally the preferred choice.

AdaGradAdamDeep Learning

0 likes · 8 min read

Deep Learning Optimizers Demystified: Momentum, AdaGrad, RMSProp & Adam Explained

Hulu Beijing

Jan 4, 2018 · Artificial Intelligence

Why SGD Fails and How Momentum, AdaGrad, and Adam Fix It

This article explains why vanilla Stochastic Gradient Descent often struggles in deep learning, describes the challenges of valleys and saddle points, and introduces three major SGD variants—Momentum, AdaGrad, and Adam—detailing their motivations, update rules, and advantages.

AdaGradAdamMomentum

0 likes · 13 min read

Why SGD Fails and How Momentum, AdaGrad, and Adam Fix It