Artificial Intelligence 21 min read

Understanding Backpropagation: From Simple to Advanced Neural Network Implementations in Python

This article explains the back‑propagation algorithm in neural networks, starting with a simple single‑neuron example using ReLU, Sigmoid and MSE, then extending to multi‑layer matrix‑based networks, providing detailed Python code, gradient calculations, and comparisons with TensorFlow implementations.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Understanding Backpropagation: From Simple to Advanced Neural Network Implementations in Python

The article begins with a brief introduction to the concept of back‑propagation, outlining the four main steps: forward pass to compute predictions, error calculation, backward error propagation, and parameter updates.

Definition of Activation and Loss Functions

ReLU, Sigmoid, and Mean Squared Error (MSE) are defined along with their derivatives to facilitate gradient computation. The implementations are provided as Python classes with __call__ and diff methods.

import random
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt

# ReLU activation
class ReLU:
    def __call__(self, x):
        return np.maximum(0, x)
    def diff(self, x):
        x_temp = x.copy()
        x_temp[x_temp > 0] = 1
        return x_temp

# Sigmoid activation
class Sigmoid:
    def __call__(self, x):
        return 1/(1+np.exp(-x))
    def diff(self, x):
        return x*(1-x)

# MSE loss
class MSE:
    def __call__(self, true, pred):
        return np.mean(np.power(pred-true, 2), keepdims=True)
    def diff(self, true, pred):
        return pred-true

relu = ReLU()
sigmoid = Sigmoid()
mse = MSE()

Simple Backpropagation Example

A single‑neuron network with a sigmoid activation is trained on randomly generated scalar inputs x , weight w , bias b , and target true . The forward computation follows x → w·x+b → sigmoid(w·x+b) → MSE(true, sigmoid(w·x+b)) . The backward pass uses the chain rule to compute gradients of the loss with respect to w and b , and updates them with w -= lr * x * sigmoid.diff(pred) * mse.diff(true, pred) and b -= lr * sigmoid.diff(pred) * mse.diff(true, pred) . Training over 520 epochs shows a decreasing loss curve and predictions converging toward the target.

x = random.random()
w = random.random()
b = random.random()
true = random.random()
print(f'x={x}   true={true}')

lr = 0.3
epochs = 520
loss_hisory = []
for epoch in range(epochs):
    pred = sigmoid(w * x + b)
    loss = mse(true, pred)
    w -= lr * x * sigmoid.diff(pred) * mse.diff(true, pred)
    b -= lr * sigmoid.diff(pred) * mse.diff(true, pred)
    if epoch % 100 == 0:
        print(f'epoch {epoch}, loss={loss}, pred={pred}')
    loss_hisory.append(loss)
print(f'epoch {epoch+1}, loss={loss}, pred={pred}')
plt.plot(loss_hisory)
plt.show()

Advanced Backpropagation with Matrices

The article extends the concept to a three‑neuron hidden layer, using matrix multiplication ( @ ) instead of scalar products. The forward pass becomes x → x@w1+b1 → sigmoid → y@w2+b2 → sigmoid → MSE . Gradient updates for both layers are derived using the chain rule and implemented with NumPy operations such as w2 -= lr * x.T @ sigmoid.diff(pred) * mse.diff(true, pred) .

x = np.random.rand(1, 1)
# weights and biases for two layers
w1 = np.random.rand(1, 3)
b1 = np.random.rand(1, 3)
w2 = np.random.rand(3, 1)
b2 = np.random.rand(1, 1)
true = np.array([[0.1]])
lr = 0.1
epochs = 520
loss_hisory = []
for epoch in range(epochs):
    y = sigmoid(x @ w1 + b1)
    pred = sigmoid(y @ w2 + b2)
    loss = mse(true, pred)
    # update output layer
    w2 -= lr * y.T @ sigmoid.diff(pred) * mse.diff(true, pred)
    b2 -= lr * sigmoid.diff(pred) * mse.diff(true, pred)
    # update hidden layer
    w1 -= lr * x.T @ (sigmoid.diff(y) * ((sigmoid.diff(pred) * mse.diff(true, pred)) @ w2.T))
    b1 -= lr * (sigmoid.diff(y) * ((sigmoid.diff(pred) * mse.diff(true, pred)) @ w2.T))
    if epoch % 100 == 0:
        print(f'epoch {epoch}, loss={loss}, pred={pred}')
    loss_hisory.append(loss[0])
print(f'epoch {epoch+1}, loss={mse(true, pred)}, pred={pred}')
plt.plot(loss_hisory)
plt.show()

Hand‑crafted Neural Network Framework

A minimal neural‑network library is built from scratch. A Linear layer stores weights, bias, and activation, records intermediate values for back‑propagation, and provides an update method that applies gradient descent. The NetWork class assembles three layers (two ReLU, one Sigmoid) and implements fit and backward methods to train on a small dataset.

class Linear:
    def __init__(self, inputs, outputs, activation):
        self.weight = np.random.rand(inputs, outputs) / 10
        self.weight = self.weight / self.weight.sum()
        self.bias = np.random.rand(outputs) / 10
        self.bias = self.bias / self.bias.sum()
        self.activation = activation
        self.x_temp = None
        self.t_temp = None
    def __call__(self, x, parent):
        self.x_temp = x
        self.t_temp = self.activation(x @ self.weight + self.bias)
        if self not in parent.layers:
            parent.layers.append(self)
        return self.t_temp
    def update(self, grad):
        activation_diff_grad = self.activation.diff(self.t_temp) * grad
        new_grad = activation_diff_grad @ self.weight.T
        self.weight -= lr * self.x_temp.T @ activation_diff_grad
        self.bias -= lr * activation_diff_grad.mean(axis=0)
        return new_grad

class NetWork:
    def __init__(self):
        self.layers = []
        self.linear_1 = Linear(4, 16, activation=relu)
        self.linear_2 = Linear(16, 8, activation=relu)
        self.linear_3 = Linear(8, 3, activation=sigmoid)
    def __call__(self, x):
        x = self.linear_1(x, self)
        x = self.linear_2(x, self)
        x = self.linear_3(x, self)
        return x
    def fit(self, x, y, epochs, step=100):
        for epoch in range(epochs):
            pred = self(x)
            self.backward(y, pred)
            if epoch % step == 0:
                print(f'epoch {epoch}, loss={mse(y, pred)}, pred={pred}')
        print(f'epoch {epoch+1}, loss={mse(y, pred)}, pred={pred}')
    def backward(self, true, pred):
        grad = mse.diff(true, pred)
        for layer in reversed(self.layers):
            grad = layer.update(grad)

TensorFlow Verification

The custom implementation is validated against TensorFlow. Random inputs and parameters are used to compute a forward pass with tf.nn.relu and tf.nn.sigmoid , followed by tf.keras.losses.mse . Gradients of the loss with respect to the first‑layer weights are obtained via tf.GradientTape and shown to match the gradients produced by the hand‑crafted network.

with tf.GradientTape() as tape_1:
    tape_1.watch(w1)
    y = tf.nn.relu(x @ w1 + b1)
    y = tf.nn.sigmoid(y @ w2 + b2)
    y = tf.nn.sigmoid(y @ w3 + b3)
    loss = tf.keras.losses.mse(true, y)

dLoss_dW1 = tape_1.gradient(loss, w1)
print('loss on w1 gradient:', dLoss_dW1.numpy())

Both the TensorFlow and the custom network produce identical gradient values, confirming the correctness of the manual back‑propagation implementation.

machine learningPythonneural networksgradient descentbackpropagation
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.