How Support Vector Machines Classify Data: Core Principles Explained
Support Vector Machines (SVM), introduced in 1992, are powerful data‑mining methods based on statistical learning theory that excel at handling small‑sample, nonlinear, and high‑dimensional regression and classification tasks, with distinct formulations for classification (SVC) and regression (SVR).
Support Vector Machines (SVM) are data‑mining methods derived from statistical learning theory, proposed in 1992 by Boser, Guyon, and Vapnik, and are effective for small‑sample, nonlinear, and high‑dimensional regression and classification problems.
SVM consists of Support Vector Classification (SVC) for categorical outputs and Support Vector Regression (SVR) for continuous outputs.
Fundamental Principles
In classification, a training sample set is used to model the relationship between input variables and class labels, enabling prediction of the class of new samples. The description is often illustrated with a binary‑class example.
Let the input space be 𝑋, where each point is represented by a vector of attributes. Assume class +1 contains m₁ training points and class –1 contains m₂ points.
The goal is to find a real‑valued function f(x) that defines a classification rule, typically sign(f(x)), to assign any pattern to a class.
Linearly Separable SVM
If there exists a hyperplane that separates the two classes with all positive samples on one side and all negative samples on the other, the training set is linearly separable and the corresponding classification problem is linearly separable.
The two class sample sets are denoted as 𝑋₊ and 𝑋₋, and their convex hulls are defined accordingly.
A hyperplane in the input space can be written as w·x + b = 0; scaling w and b by a non‑zero constant yields the same hyperplane. The hyperplane that maximizes the margin is called the canonical (optimal) hyperplane.
Theorem: For a linearly separable training set, there exists a unique canonical hyperplane that maximizes the margin. Points that lie on the margin satisfy the equality and are called support vectors.
Only the support vectors influence the construction of the optimal hyperplane, which explains the sparsity of SVM solutions.
The margin for a sample of class +1 is 1/‖w‖ and for class –1 is –1/‖w‖; the distance between the two margins equals 2/‖w‖.
The optimal hyperplane is obtained by maximizing the margin, which can be formulated as a quadratic programming problem with a convex objective function and linear constraints.
Introducing the Lagrange function L(w,b,α)=½‖w‖²−∑α_i[y_i(w·x_i+b)−1], where α_i are Lagrange multipliers, and applying the KKT conditions leads to the dual problem.
Solving the dual yields the optimal α_i values; only those corresponding to support vectors are non‑zero. The classification hyperplane is then w=∑α_i y_i x_i, and the decision function is f(x)=sign(∑α_i y_i (x_i·x)+b).
This decision function can be used to classify previously unseen samples.
司守奎,孙玺菁 Python数学实验与建模
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.