Fundamentals of Functions, Sequences, and Their Role in Neural Networks
This article introduces basic mathematical functions—including linear, quadratic, exponential, and step functions—explains sequences and their formulas, and shows how these concepts underpin neural‑network computations such as weighted inputs, activation functions like sigmoid, and error‑backpropagation, providing clear examples and visual illustrations.
1 Functions
From this article onward, we will immerse ourselves in the sea of mathematics, striving to explain these concepts in an accessible way.
1.1 Linear Functions
The most fundamental and important mathematical function is the 一次函数 (linear function), which is also crucial in the world of neural networks.
1.1.1 Univariate Linear Function
This function can be expressed as 斜率 a (the slope that controls the line’s direction) and 截距 b (the intercept that shifts the line relative to the origin), giving the equation y = ax + b (a, b are constants, a ≠ 0). When variables x and y satisfy this formula, they are said to have a 一次函数关系 (linear relationship).
If for every x there is a uniquely determined y, then y is a function of x, denoted y = f(x); x is the independent variable and y the dependent variable.
The graph of a linear function is a straight line, as shown below.
Example: the linear function y = 2x + 1 has a slope of 2 and an intercept of 1.
1.1.2 Multivariate Linear Functions
When the expression y = ax + b contains more than one variable, it becomes a multivariate linear function (e.g., y = a₁x₁ + a₂x₂ + … + aₙxₙ + b). Such a relationship is also called a 一次函数关系 .
In neural networks, the weighted input Z can be written as a linear function: Z = w₁x₁ + w₂x₂ + … + wₙxₙ + b.
z = w₁x₁ + w₂x₂ + … + wₙxₙ + b
1.2 Quadratic Functions
1.2.1 Univariate Quadratic Function
Quadratic functions are important; for example, the mean‑squared error (cost function) used in machine learning is a quadratic function, expressed as y = ax² + bx + c . Its graph is a parabola whose opening direction depends on the sign of a.
When a > 0, the parabola opens upward and has a minimum.
When a < 0, the parabola opens downward and has a maximum.
Thus, for a > 0 the function attains a minimum value, which underlies the least‑squares method.
Example: the quadratic function y = x² + 2x + 1 (a = 1 > 0) reaches its minimum at x = -1, y = 0.
1.2.2 Multivariate Quadratic Functions
In neural networks we also encounter multivariate quadratic functions, such as the squared error cost C = (x₁ - t₁)² + (x₂ - t₂)² + … .
1.3 Unit Step Function
The unit step function is discontinuous at the origin and therefore not differentiable there, which makes it unsuitable as a primary activation function in neural networks.
1.4 Exponential Functions
An exponential function has the form y = aˣ where a > 0 and a ≠ 1. The constant a is called the base. The natural base e (≈ 2.718) is especially important in mathematics and AI.
1.4.1 Sigmoid Function
The sigmoid function, defined as σ(x) = 1 / (1 + exp(-x)) , is a smooth, differentiable activation function whose output lies in (0, 1), allowing a probabilistic interpretation.
1.5 Normal Distribution Probability Density Function
When initializing weights and biases in a neural network, it is common to draw them from a normal (Gaussian) distribution because this often leads to good training results.
2 Sequences
2.1 Meaning of a Sequence
A 数列 (sequence) is an ordered list of numbers, e.g., 2, 4, 6, 8, … . Each element is called a “term”; the first term is the “首项”, the last term of a finite sequence is the “末项”, and the total number of terms is the “项数”.
2.2 General Term Formula
The n‑th term of a sequence is denoted aₙ. For the even‑number sequence, the general term is aₙ = 2n.
2.3 Recurrence Relations
Besides an explicit formula, a sequence can be defined by a recurrence relation, which expresses each term based on previous terms.
2.4 Systems of Recurrence Relations
In neural networks, the inputs and outputs of neurons across layers can be modeled as coupled recurrence relations. This viewpoint underlies the error‑backpropagation algorithm.
By linking the weighted sums and activation functions of successive layers through such systems, back‑propagation computes gradients efficiently.
- END -
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.