Understanding Gradient Descent: Basics, Advantages, and Limitations

This article explains the fundamental principle of gradient descent as the steepest‑descent optimization method, derives its direction using Taylor expansion and the Cauchy‑Schwarz inequality, illustrates why it can be slow on functions like Rosenbrock, and discusses its advantages and convergence properties.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Understanding Gradient Descent: Basics, Advantages, and Limitations

In modern deep learning, optimization is core, and gradient descent is the oldest and simplest unconstrained optimization algorithm.

Gradient descent moves in the direction of the negative gradient, which yields the steepest decrease of the objective function.

Derivation: using a first‑order Taylor expansion of f(x) at point x and a step size α>0, the decrease condition leads to choosing the direction d = -∇f(x).

By the Cauchy‑Schwarz inequality, the maximal decrease is achieved when d is parallel to -∇f(x); the equality holds when the vectors are collinear.

Although called “steepest”, the method is only locally steep; globally it can converge very slowly, especially on ill‑conditioned functions such as the Rosenbrock (banana) function.

The zig‑zag trajectory shown in the figures illustrates how the step size shrinks and the search direction becomes orthogonal between successive iterations.

Advantages: simple implementation, low computational cost, no special requirements on the initial point, and it provides the search direction for many advanced algorithms.

Convergence: gradient descent with exact line search guarantees global convergence, but the convergence rate is linear.

References: Wikipedia pages on Cauchy‑Schwarz inequality, gradient descent, and Rosenbrock function.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationmachine learningCauchy-Schwarz inequalityRosenbrock function
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.