Hulu Beijing
Jan 4, 2018 · Artificial Intelligence
Why SGD Fails and How Momentum, AdaGrad, and Adam Fix It
This article explains why vanilla Stochastic Gradient Descent often struggles in deep learning, describes the challenges of valleys and saddle points, and introduces three major SGD variants—Momentum, AdaGrad, and Adam—detailing their motivations, update rules, and advantages.
AdaGradAdamMomentum
0 likes · 13 min read