Neural Network Construction Example with Python Implementation
This article presents a comprehensive tutorial on building and training a multi‑layer neural network in Python, covering data preprocessing, model architecture definition, parameter initialization, forward and backward propagation, cost computation, and parameter updates with code examples for activation functions and optimization techniques.
Neural Network Construction Example
Notation: superscript [l] denotes layer l ; superscript (i) denotes example i ; subscript i denotes the i ‑th element of a vector.
A single‑layer neuron first computes a linear function z = Wx + b and then applies an activation function g (e.g., sigmoid, tanh, ReLU), producing output a = g(Wx + b) .
Dataset: assume a large weather database containing temperature ( x1 ), humidity ( x2 ), pressure ( x3 ) and rainfall label (1 for rain, 0 otherwise). A training set m_train and a test set m_test are defined.
Preprocessing: the whole dataset is centered and standardized by subtracting the mean and dividing by the standard deviation.
General Method (Partial Algorithm)
1. Define model architecture (input features, layer sizes, activation functions). 2. Initialize parameters and hyper‑parameters (number of iterations, number of layers L , hidden‑layer sizes, learning rate α ). 3. Iterate:
Forward propagation (compute Z and A for each layer).
Compute cost (cross‑entropy).
Backward propagation (compute gradients using the chain rule).
Update parameters (gradient descent).
4. Use the trained parameters to predict labels on new data.
Activation Functions
The activation functions add non‑linearity. The example uses sigmoid and ReLU.
def sigmoid(Z):
S = 1 / (1 + np.exp(-Z))
return S
def relu(Z):
R = np.maximum(0, Z)
return R
def sigmoid_backward(dA, Z):
S = sigmoid(Z)
dS = S * (1 - S)
return dA * dS
def relu_backward(dA, Z):
dZ = np.array(dA, copy=True)
dZ[Z <= 0] = 0
return dZForward Propagation Model
def L_model_forward(X, parameters, nn_architecture):
forward_cache = {}
A = X
number_of_layers = len(nn_architecture)
for l in range(1, number_of_layers):
A_prev = A
W = parameters['W' + str(l)]
b = parameters['b' + str(l)]
activation = nn_architecture[l]["activation"]
Z, A = linear_activation_forward(A_prev, W, b, activation)
forward_cache['Z' + str(l)] = Z
forward_cache['A' + str(l)] = A
AL = A
return AL, forward_cache
def linear_activation_forward(A_prev, W, b, activation):
if activation == "sigmoid":
Z = linear_forward(A_prev, W, b)
A = sigmoid(Z)
elif activation == "relu":
Z = linear_forward(A_prev, W, b)
A = relu(Z)
return Z, A
def linear_forward(A, W, b):
Z = np.dot(W, A) + b
return ZCost Function (Cross‑Entropy)
def compute_cost(AL, Y):
m = Y.shape[1]
logprobs = np.multiply(np.log(AL), Y) + np.multiply(1 - Y, np.log(1 - AL))
cost = - np.sum(logprobs) / m
cost = np.squeeze(cost)
return costBackward Propagation
Backward propagation computes gradients of the cost with respect to parameters using the chain rule.
def L_model_backward(AL, Y, parameters, forward_cache, nn_architecture):
grads = {}
number_of_layers = len(nn_architecture)
m = AL.shape[1]
Y = Y.reshape(AL.shape)
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
dA_prev = dAL
for l in reversed(range(1, number_of_layers)):
dA_curr = dA_prev
activation = nn_architecture[l]["activation"]
W_curr = parameters['W' + str(l)]
Z_curr = forward_cache['Z' + str(l)]
A_prev = forward_cache['A' + str(l-1)]
dA_prev, dW_curr, db_curr = linear_activation_backward(dA_curr, Z_curr, A_prev, W_curr, activation)
grads["dW" + str(l)] = dW_curr
grads["db" + str(l)] = db_curr
return grads
def linear_activation_backward(dA, Z, A_prev, W, activation):
if activation == "relu":
dZ = relu_backward(dA, Z)
elif activation == "sigmoid":
dZ = sigmoid_backward(dA, Z)
dA_prev, dW, db = linear_backward(dZ, A_prev, W)
return dA_prev, dW, db
def linear_backward(dZ, A_prev, W):
m = A_prev.shape[1]
dW = np.dot(dZ, A_prev.T) / m
db = np.sum(dZ, axis=1, keepdims=True) / m
dA_prev = np.dot(W.T, dZ)
return dA_prev, dW, dbParameter Update
def update_parameters(parameters, grads, learning_rate):
L = len(parameters)
for l in range(1, L):
parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate * grads["dW" + str(l)]
parameters["b" + str(l)] = parameters["b" + str(l)] - learning_rate * grads["db" + str(l)]
return parametersFull Model
def L_layer_model(X, Y, nn_architecture, learning_rate=0.0075, num_iterations=3000, print_cost=False):
np.random.seed(1)
costs = []
parameters = initialize_parameters(nn_architecture)
for i in range(num_iterations):
AL, forward_cache = L_model_forward(X, parameters, nn_architecture)
cost = compute_cost(AL, Y)
grads = L_model_backward(AL, Y, parameters, forward_cache, nn_architecture)
parameters = update_parameters(parameters, grads, learning_rate)
if print_cost and i % 100 == 0:
print("Cost after iteration %i: %f" % (i, cost))
costs.append(cost)
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate=" + str(learning_rate))
plt.show()
return parametersFurther improvements: if the training set is small, overfitting may occur; regularization techniques such as L2 weight decay or dropout can mitigate this. Advanced optimizers like mini‑batch gradient descent, momentum, and Adam can accelerate convergence and achieve better final cost values.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.