Tag

AdaGrad

1 views collected around this technical thread.

Hulu Beijing
Hulu Beijing
Jan 4, 2018 · Artificial Intelligence

Why SGD Fails and How Momentum, AdaGrad, and Adam Fix It

This article explains why vanilla Stochastic Gradient Descent often struggles in deep learning, describes the challenges of valleys and saddle points, and introduces three major SGD variants—Momentum, AdaGrad, and Adam—detailing their motivations, update rules, and advantages.

AdaGradAdamMomentum
0 likes · 13 min read
Why SGD Fails and How Momentum, AdaGrad, and Adam Fix It