From Biological Neurons to Deep Learning: How MP Models Evolve
This article explains the structure of biological neurons, introduces the McCulloch‑Pitts (MP) mathematical model, shows how manual weight adjustments work, and walks through the development from single‑layer perceptrons to two‑layer networks and modern deep learning techniques, covering activation functions, training algorithms, and practical examples.
Biological Neuron
Dendrites : multiple dendrites receive signals from other neurons and transmit them to the soma.
Nucleus : maintains normal metabolism and processes information from dendrites.
Axon and Axon Terminals : conduct electrical signals to other neurons; each terminal connects to the next neuron.
Synaptic Gap : the connection region between neurons.
MP Neuron Mathematical Model
The MP model mimics the basic structure and operation of a biological neuron and is an early artificial neuron model still used today.
Mathematical Model
The MP neuron consists of inputs, weights, a threshold, a summation function, an activation (sgn) function, a bias term, and an output.
Input x: stimulus strength from other neurons.
Weight w: sensitivity to each input.
Threshold θ: difficulty of activation.
Summation ∑ w·x: total weighted input.
Activation sgn: returns 1 if the sum > 0, -1 if < 0, and may vary at 0.
Bias b: constant term independent of inputs.
Output y: result after activation.
The computation first performs a linear weighted sum of x and w, then applies the sgn activation, making the MP model a linear classifier for binary classification.
Manual Weight Adjustment
In early MP models, weights and bias were set manually based on expert knowledge, limiting the model's adaptability. Later deep learning methods learn these parameters automatically.
Application Example
For a simple weather‑based umbrella decision (sunny=0, rainy=1), set weight w=1 and bias b=-0.5. The output is 1 if w·input + b > 0, otherwise 0, correctly predicting umbrella usage.
Single‑layer Neural Network (Perceptron)
Introduced by Rosenblatt in 1958, the perceptron is the first learnable artificial neural network and functions as a linear classifier similar to logistic regression.
Linear Regression Model
y = β0 + β1x1 + β2x2 + … + βnxn + ϵIt predicts a continuous target by fitting a linear combination of inputs.
Logistic Regression Model
P(y=1)=1/(1+e^{-(β0+β1x1+β2x2+…+βnxn)})It maps the linear combination to a probability for binary classification.
Mathematical Model of Perceptron
Adding an input unit to the MP model creates a perceptron with an input layer (data transmission) and an output layer (computation). The perceptron learns weights and bias via supervised learning, adjusting them based on the error between actual and expected outputs.
Training Process
Initialize weights randomly.
For each training sample, compute the actual output.
Calculate the error with respect to the labeled output.
Update weights using the error correction method.
Repeat until error falls below a threshold or a maximum number of epochs is reached.
Perceptron Example: AND Gate
Training samples with inputs (x1, x2) and output y are used to learn weights [w1, w2]=[0.1, 0.2] and bias b=0. The activation is a step function: output 1 if w1·x1 + w2·x2 + b ≥ 0, else 0.
Two‑layer Neural Network (Multilayer Perceptron)
Adding a hidden layer enables solving non‑linear problems such as XOR. Hidden neurons apply a non‑linear activation (e.g., sigmoid) to transform the input space, allowing the output layer to perform linear classification on the transformed data.
Non‑linear Activation Functions
Sigmoid : maps input to (0,1); suffers from gradient vanishing.
ReLU : outputs max(0, x); mitigates gradient vanishing but can cause dead neurons.
The universal approximation theorem states that a feed‑forward network with one hidden layer can approximate any continuous function given enough hidden units.
Mathematical Model
With input vector a, weight matrices W(1), W(2), and bias b, the forward propagation is expressed as z = g(W(2)·f(W(1)·a + b) + b), where f and g are activation functions.
Training Techniques
Training uses gradient descent to minimize a loss function (e.g., mean squared error) and backpropagation to compute gradients efficiently across layers. Enhancements such as momentum, dropout, and data augmentation improve convergence and generalization.
Deep Learning (Multilayer Neural Networks)
Deep networks with many hidden layers capture hierarchical features: lower layers detect edges, middle layers detect shapes, and higher layers detect objects. More layers increase representation depth and function approximation capacity.
Mathematical Model
Deep networks extend the two‑layer formulation by stacking additional weight matrices and activation functions, still using forward propagation and backpropagation for training.
Basic Working Principle
Forward propagation: compute predictions layer by layer.
Loss computation: compare predictions with targets.
Backpropagation: compute gradients using the chain rule.
Parameter update: adjust weights and biases via gradient descent.
Iterate until loss is acceptable or epochs are exhausted.
Loss Function Optimization
Typical loss is mean squared error: loss = (yp - y)^2 for each sample, summed over all samples.
Gradient Descent
Iteratively moves parameters opposite to the gradient of the loss to find a local minimum.
Backpropagation Algorithm
Computes gradients layer by layer from output to input, enabling efficient training of deep networks.
Parameter Combination
Increasing hidden units or adding layers raises the total number of parameters, enhancing expressive power but also requiring careful regularization.
Training Techniques
Beyond ReLU, modern deep learning focuses on optimization and generalization, using momentum‑based gradient descent, dropout, and data augmentation to prevent overfitting.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
