Artificial Intelligence 13 min read

Neural Network Construction Example with Python Implementation

This article presents a comprehensive tutorial on building and training a multi‑layer neural network in Python, covering data preprocessing, model architecture definition, parameter initialization, forward and backward propagation, cost computation, and parameter updates with code examples for activation functions and optimization techniques.

Python Programming Learning Circle

Feb 8, 2020

Neural Network Construction Example

Notation: superscript [l] denotes layer l; superscript (i) denotes example i; subscript i denotes the i ‑th element of a vector.

A single‑layer neuron first computes a linear function z = Wx + b and then applies an activation function g (e.g., sigmoid, tanh, ReLU), producing output a = g(Wx + b).

Dataset: assume a large weather database containing temperature ( x1), humidity ( x2), pressure ( x3) and rainfall label (1 for rain, 0 otherwise). A training set m_train and a test set m_test are defined.

Preprocessing: the whole dataset is centered and standardized by subtracting the mean and dividing by the standard deviation.

General Method (Partial Algorithm)

1. Define model architecture (input features, layer sizes, activation functions). 2. Initialize parameters and hyper‑parameters (number of iterations, number of layers L, hidden‑layer sizes, learning rate α). 3. Iterate:

Forward propagation (compute Z and A for each layer).

Compute cost (cross‑entropy).

Backward propagation (compute gradients using the chain rule).

Update parameters (gradient descent).

4. Use the trained parameters to predict labels on new data.

Activation Functions

The activation functions add non‑linearity. The example uses sigmoid and ReLU.

def sigmoid(Z):
    S = 1 / (1 + np.exp(-Z))
    return S

def relu(Z):
    R = np.maximum(0, Z)
    return R

def sigmoid_backward(dA, Z):
    S = sigmoid(Z)
    dS = S * (1 - S)
    return dA * dS

def relu_backward(dA, Z):
    dZ = np.array(dA, copy=True)
    dZ[Z <= 0] = 0
    return dZ

Forward Propagation Model

def L_model_forward(X, parameters, nn_architecture):
    forward_cache = {}
    A = X
    number_of_layers = len(nn_architecture)
    for l in range(1, number_of_layers):
        A_prev = A
        W = parameters['W' + str(l)]
        b = parameters['b' + str(l)]
        activation = nn_architecture[l]["activation"]
        Z, A = linear_activation_forward(A_prev, W, b, activation)
        forward_cache['Z' + str(l)] = Z
        forward_cache['A' + str(l)] = A
    AL = A
    return AL, forward_cache

def linear_activation_forward(A_prev, W, b, activation):
    if activation == "sigmoid":
        Z = linear_forward(A_prev, W, b)
        A = sigmoid(Z)
    elif activation == "relu":
        Z = linear_forward(A_prev, W, b)
        A = relu(Z)
    return Z, A

def linear_forward(A, W, b):
    Z = np.dot(W, A) + b
    return Z

Cost Function (Cross‑Entropy)

def compute_cost(AL, Y):
    m = Y.shape[1]
    logprobs = np.multiply(np.log(AL), Y) + np.multiply(1 - Y, np.log(1 - AL))
    cost = - np.sum(logprobs) / m
    cost = np.squeeze(cost)
    return cost

Backward Propagation

Backward propagation computes gradients of the cost with respect to parameters using the chain rule.

def L_model_backward(AL, Y, parameters, forward_cache, nn_architecture):
    grads = {}
    number_of_layers = len(nn_architecture)
    m = AL.shape[1]
    Y = Y.reshape(AL.shape)
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
    dA_prev = dAL
    for l in reversed(range(1, number_of_layers)):
        dA_curr = dA_prev
        activation = nn_architecture[l]["activation"]
        W_curr = parameters['W' + str(l)]
        Z_curr = forward_cache['Z' + str(l)]
        A_prev = forward_cache['A' + str(l-1)]
        dA_prev, dW_curr, db_curr = linear_activation_backward(dA_curr, Z_curr, A_prev, W_curr, activation)
        grads["dW" + str(l)] = dW_curr
        grads["db" + str(l)] = db_curr
    return grads

def linear_activation_backward(dA, Z, A_prev, W, activation):
    if activation == "relu":
        dZ = relu_backward(dA, Z)
    elif activation == "sigmoid":
        dZ = sigmoid_backward(dA, Z)
    dA_prev, dW, db = linear_backward(dZ, A_prev, W)
    return dA_prev, dW, db

def linear_backward(dZ, A_prev, W):
    m = A_prev.shape[1]
    dW = np.dot(dZ, A_prev.T) / m
    db = np.sum(dZ, axis=1, keepdims=True) / m
    dA_prev = np.dot(W.T, dZ)
    return dA_prev, dW, db

Parameter Update

def update_parameters(parameters, grads, learning_rate):
    L = len(parameters)
    for l in range(1, L):
        parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate * grads["dW" + str(l)]
        parameters["b" + str(l)] = parameters["b" + str(l)] - learning_rate * grads["db" + str(l)]
    return parameters

Full Model

def L_layer_model(X, Y, nn_architecture, learning_rate=0.0075, num_iterations=3000, print_cost=False):
    np.random.seed(1)
    costs = []
    parameters = initialize_parameters(nn_architecture)
    for i in range(num_iterations):
        AL, forward_cache = L_model_forward(X, parameters, nn_architecture)
        cost = compute_cost(AL, Y)
        grads = L_model_backward(AL, Y, parameters, forward_cache, nn_architecture)
        parameters = update_parameters(parameters, grads, learning_rate)
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost))
        costs.append(cost)
    plt.plot(np.squeeze(costs))
    plt.ylabel('cost')
    plt.xlabel('iterations (per tens)')
    plt.title("Learning rate=" + str(learning_rate))
    plt.show()
    return parameters

Further improvements: if the training set is small, overfitting may occur; regularization techniques such as L2 weight decay or dropout can mitigate this. Advanced optimizers like mini‑batch gradient descent, momentum, and Adam can accelerate convergence and achieve better final cost values.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Deep Learning

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.