How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

This article walks through the complete training cycle of a simple neural network—from random weight initialization and forward propagation with labeled data, through loss calculation and gradient‑based weight updates, to iterative epochs, average loss, and practical issues like gradient explosion and vanishing.

AI Large Model Application Practice
AI Large Model Application Practice
AI Large Model Application Practice
How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

Training a Simple Neural Network

This section demonstrates how a tiny feed‑forward network that classifies an input as leaf or flower is trained from scratch using labeled RGB values and a volume feature.

1. Random Weight Initialization

All parameters (weights and biases) are set to random numbers before any data is seen. The diagram below shows a network with randomly assigned connection weights.

2. Forward Pass with a Labeled Example

Given a training sample R=181, G=216, B=210, Vol=12.0 (label: leaf), the network produces an output vector, e.g. (0.6, 0.4) where the first component corresponds to the leaf probability and the second to the flower probability. Because the target is (1, 0), the prediction is still inaccurate.

3. Loss Computation

The loss quantifies the distance between prediction and target. The article uses a simple L1 loss (sum of absolute differences): Loss = |1 - 0.6| + |0 - 0.4| = 0.8 A smaller loss indicates that the network output is closer to the desired values.

4. Weight Adjustment (Gradient Concept)

To reduce the loss we must change the weights. The direction and magnitude of each change are given by the gradient of the loss with respect to that weight. For a single weight example, increasing a weight from 0.17 to 0.18 and recomputing the loss tells whether the change is beneficial.

Increase the weight, run a forward pass, and observe the new loss. If the loss decreases, the adjustment direction is correct.

5. Gradient Descent Update Rule

The gradient tells how the loss changes when a weight is perturbed. Gradient‑descent updates follow:

new_weight = old_weight - learning_rate * gradient

Example: old_weight = 0.17, gradient = 200, learning_rate = 0.01new_weight = 0.17 - 0.01 * 200 = -1.83.

6. Iterative Optimization (Epoch)

Training repeats the following steps for many epochs (full passes over the training set):

Forward propagation : feed inputs through the network to obtain outputs.

Loss calculation : compute the loss between outputs and targets.

Backward propagation : compute gradients of the loss w.r.t. each parameter.

Parameter update : apply the gradient‑descent rule to adjust weights.

7. Average Loss over a Batch

When training on many samples, the loss is averaged across the batch. For three samples with losses 0.2, 0.5, 0.1, the average loss is (0.2+0.5+0.1)/3 = 0.2667. Optimizing the average loss ensures the model improves globally rather than over‑fitting a single example.

8. Gradient Explosion and Vanishing

In deep networks, gradients can become extremely small (vanishing) or extremely large (exploding), causing unstable training. Typical mitigations include using better loss functions, appropriate activation functions, and normalization techniques.

9. Summary of One Training Iteration

The diagram below summarizes a single epoch:

Key steps:

Forward propagation : compute network outputs for the current batch.

Loss calculation : evaluate the discrepancy with targets.

Backward propagation : obtain gradients for all parameters.

Parameter update : adjust weights using the gradient‑descent formula.

Repeating these steps reduces the loss progressively, leading the network to learn appropriate weights for distinguishing leaves from flowers.

machine learningAIneural networksmodel trainingGradient Descent
AI Large Model Application Practice
Written by

AI Large Model Application Practice

Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.