Artificial Intelligence 37 min read

Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training

This article provides an in‑depth tutorial on neural network algorithms, covering their biological inspiration, significance, advantages and drawbacks, detailed architecture, data preparation, one‑hot encoding, weight initialization, forward and backward propagation, cost functions, regularization, gradient checking, and complete Python code examples.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training

This tutorial explains neural network algorithms, starting with a definition that they mimic the connections between human brain neurons and use multiple layers and nonlinear activation functions to model and predict complex data.

The significance of neural networks is highlighted across AI applications such as image, speech, and natural language processing, as well as in industry, healthcare, and finance, where they improve efficiency and drive scientific research.

Advantages include strong fitting ability, automatic feature learning, adaptability to various data types, and parallel processing; disadvantages cover long training times, over‑fitting risk, black‑box nature, and the need for large labeled datasets.

The article then delves into the algorithm’s core concepts, describing each neuron as a combination of linear and logistic regression, and explains the overall workflow from data preparation to model evaluation.

Data preparation code:

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from scipy.io import loadmat
import scipy.optimize as opt
from sklearn.metrics import classification_report

# Load data function
def load_data(path, transpose=True):
    data = loadmat(path)
    X = data['X']
    y = data['y']
    y = y.reshape(y.shape[0])
    if transpose:
        X = np.array([im.reshape((20,20)).T.reshape(400) for im in X])
    return X, y

X, y = load_data('ex4data1.mat', transpose=False)
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1)

One‑hot label encoding code:

def expend_y(y):
    res = []
    for i in y:
        tmp = np.zeros(10)
        tmp[i-1] = 1
        res.append(tmp)
    return np.array(res)

y = expend_y(y)

Utility functions for weight handling and vectorization are provided:

def load_weight(path):
    data = loadmat(path)
    return data['Theta1'], data['Theta2']

def serialize(a, b):
    return np.concatenate((np.ravel(a), np.ravel(b)))

def deserialize(seq):
    return seq[:25*401].reshape(25,401), seq[25*401:].reshape(10,26)

The sigmoid activation function is defined as:

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

Forward propagation:

def feed_forward(theta, X):
    t1, t2 = deserialize(theta)
    a1 = X
    z2 = a1 @ t1.T
    a2 = np.insert(sigmoid(z2), 0, np.ones(z2.shape[0]), axis=1)
    z3 = a2 @ t2.T
    h = sigmoid(z3)
    return a1, z2, a2, z3, h

Cost functions (with and without regularization) compute the average cross‑entropy loss and add L2 penalty terms for the weight matrices.

def cost(theta, X, y):
    h = feed_forward(theta, X)[-1]
    tmp = -y * np.log(h) - (1 - y) * np.log(1 - h)
    return tmp.sum() / y.shape[0]

def regularized_cost(theta, X, y, l=1):
    t1, t2 = deserialize(theta)
    m = X.shape[0]
    reg1 = np.power(t1[:,1:], 2).sum() / (2*m)
    reg2 = np.power(t2[:,1:], 2).sum() / (2*m)
    return cost(theta, X, y) + reg1 + reg2

The gradient of the sigmoid is implemented for back‑propagation:

def sigmoid_gradient(z):
    return sigmoid(z) * (1 - sigmoid(z))

Back‑propagation (gradient computation):

def gradient(theta, X, y):
    t1, t2 = deserialize(theta)
    m = X.shape[0]
    delta1 = np.zeros(t1.shape)
    delta2 = np.zeros(t2.shape)
    a1, z2, a2, z3, h = feed_forward(theta, X)
    for i in range(m):
        a1i, z2i, a2i = a1[i], z2[i], a2[i]
        hi, yi = h[i], y[i]
        d3i = hi - yi
        z2i = np.insert(z2i, 0, 1)
        d2i = t2.T @ d3i * sigmoid_gradient(z2i)
        delta2 += np.matrix(d3i).T @ np.matrix(a2i)
        delta1 += np.matrix(d2i[1:]).T @ np.matrix(a1i)
    return serialize(delta1, delta2)

Regularized gradient adds the L2 term (excluding bias columns) to the averaged gradients:

def regularized_gradient(theta, X, y, l=1):
    m = X.shape[0]
    delta1, delta2 = deserialize(gradient(theta, X, y))
    delta1 /= m
    delta2 /= m
    t1, t2 = deserialize(theta)
    t1[:,0] = 0
    t2[:,0] = 0
    delta1 += l/m * t1
    delta2 += l/m * t2
    return serialize(delta1, delta2)

Gradient checking validates the back‑propagation implementation by comparing analytical gradients with numerical approximations using a small epsilon perturbation.

def gradient_checking(theta, X, y, epsilon, regularized=False):
    m = len(theta)
    def a_numeric_grad(plus, minus, regularized=False):
        if regularized:
            return (regularized_cost(plus, X, y) - regularized_cost(minus, X, y)) / (2*epsilon)
        else:
            return (cost(plus, X, y) - cost(minus, X, y)) / (2*epsilon)
    theta_matrix = expand_array(theta)
    epsilon_matrix = np.identity(m) * epsilon
    plus_matrix = theta_matrix + epsilon_matrix
    minus_matrix = theta_matrix - epsilon_matrix
    approx_grad = np.array([a_numeric_grad(plus_matrix[i], minus_matrix[i], regularized) for i in range(m)])
    analytic_grad = regularized_gradient(theta, X, y) if regularized else gradient(theta, X, y)
    diff = np.linalg.norm(approx_grad - analytic_grad) / np.linalg.norm(approx_grad + analytic_grad)
    print('If your backpropagation implementation is correct, the relative difference will be smaller than 1e-9.\nRelative Difference:', diff)

The article concludes that the presented steps constitute a complete neural‑network pipeline, from data loading and preprocessing to model training and validation, and encourages readers to experiment further while being aware of over‑fitting and regularization techniques.

machine learningPythonAIneural networksgradient descentbackpropagationregularization
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.