Artificial Intelligence 30 min read

From Biological Neurons to Deep Learning: How MP Models Evolve

This article explains the structure of biological neurons, introduces the McCulloch‑Pitts (MP) mathematical model, shows how manual weight adjustments work, and walks through the development from single‑layer perceptrons to two‑layer networks and modern deep learning techniques, covering activation functions, training algorithms, and practical examples.

AI Cyberspace

Jan 28, 2025

Biological Neuron

Dendrites : multiple dendrites receive signals from other neurons and transmit them to the soma.

Nucleus : maintains normal metabolism and processes information from dendrites.

Axon and Axon Terminals : conduct electrical signals to other neurons; each terminal connects to the next neuron.

Synaptic Gap : the connection region between neurons.

MP Neuron Mathematical Model

The MP model mimics the basic structure and operation of a biological neuron and is an early artificial neuron model still used today.

Mathematical Model

The MP neuron consists of inputs, weights, a threshold, a summation function, an activation (sgn) function, a bias term, and an output.

Input x: stimulus strength from other neurons.

Weight w: sensitivity to each input.

Threshold θ: difficulty of activation.

Summation ∑ w·x: total weighted input.

Activation sgn: returns 1 if the sum > 0, -1 if < 0, and may vary at 0.

Bias b: constant term independent of inputs.

Output y: result after activation.

The computation first performs a linear weighted sum of x and w, then applies the sgn activation, making the MP model a linear classifier for binary classification.

Manual Weight Adjustment

In early MP models, weights and bias were set manually based on expert knowledge, limiting the model's adaptability. Later deep learning methods learn these parameters automatically.

Application Example

For a simple weather‑based umbrella decision (sunny=0, rainy=1), set weight w=1 and bias b=-0.5. The output is 1 if w·input + b > 0, otherwise 0, correctly predicting umbrella usage.

Single‑layer Neural Network (Perceptron)

Introduced by Rosenblatt in 1958, the perceptron is the first learnable artificial neural network and functions as a linear classifier similar to logistic regression.

Linear Regression Model

y = β0 + β1x1 + β2x2 + … + βnxn + ϵ

It predicts a continuous target by fitting a linear combination of inputs.

Logistic Regression Model

P(y=1)=1/(1+e^{-(β0+β1x1+β2x2+…+βnxn)})

It maps the linear combination to a probability for binary classification.

Mathematical Model of Perceptron

Adding an input unit to the MP model creates a perceptron with an input layer (data transmission) and an output layer (computation). The perceptron learns weights and bias via supervised learning, adjusting them based on the error between actual and expected outputs.

Training Process

Initialize weights randomly.

For each training sample, compute the actual output.

Calculate the error with respect to the labeled output.

Update weights using the error correction method.

Repeat until error falls below a threshold or a maximum number of epochs is reached.

Perceptron Example: AND Gate

Training samples with inputs (x1, x2) and output y are used to learn weights [w1, w2]=[0.1, 0.2] and bias b=0. The activation is a step function: output 1 if w1·x1 + w2·x2 + b ≥ 0, else 0.

Two‑layer Neural Network (Multilayer Perceptron)

Adding a hidden layer enables solving non‑linear problems such as XOR. Hidden neurons apply a non‑linear activation (e.g., sigmoid) to transform the input space, allowing the output layer to perform linear classification on the transformed data.

Non‑linear Activation Functions

Sigmoid : maps input to (0,1); suffers from gradient vanishing.

ReLU : outputs max(0, x); mitigates gradient vanishing but can cause dead neurons.

The universal approximation theorem states that a feed‑forward network with one hidden layer can approximate any continuous function given enough hidden units.

Mathematical Model

With input vector a, weight matrices W(1), W(2), and bias b, the forward propagation is expressed as z = g(W(2)·f(W(1)·a + b) + b), where f and g are activation functions.

Training Techniques

Training uses gradient descent to minimize a loss function (e.g., mean squared error) and backpropagation to compute gradients efficiently across layers. Enhancements such as momentum, dropout, and data augmentation improve convergence and generalization.

Deep Learning (Multilayer Neural Networks)

Deep networks with many hidden layers capture hierarchical features: lower layers detect edges, middle layers detect shapes, and higher layers detect objects. More layers increase representation depth and function approximation capacity.

Mathematical Model

Deep networks extend the two‑layer formulation by stacking additional weight matrices and activation functions, still using forward propagation and backpropagation for training.

Basic Working Principle

Forward propagation: compute predictions layer by layer.

Loss computation: compare predictions with targets.

Backpropagation: compute gradients using the chain rule.

Parameter update: adjust weights and biases via gradient descent.

Iterate until loss is acceptable or epochs are exhausted.

Loss Function Optimization

Typical loss is mean squared error: loss = (yp - y)^2 for each sample, summed over all samples.

Gradient Descent

Iteratively moves parameters opposite to the gradient of the loss to find a local minimum.

Backpropagation Algorithm

Computes gradients layer by layer from output to input, enabling efficient training of deep networks.

Parameter Combination

Increasing hidden units or adding layers raises the total number of parameters, enhancing expressive power but also requiring careful regularization.

Training Techniques

Beyond ReLU, modern deep learning focuses on optimization and generalization, using momentum‑based gradient descent, dropout, and data augmentation to prevent overfitting.

deep learning neural networks activation functions Gradient Descent Backpropagation Perceptron MP model

Written by

AI Cyberspace

AI, big data, cloud computing, and networking.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.