Are Projected Points Still Linearly Separable? SVM Insight & Proof
This article examines whether points from two linearly separable classes remain separable after being projected onto the SVM decision hyperplane, providing geometric and convex‑optimization proofs along with illustrative diagrams and references for deeper study.
Introduction
Support Vector Machine (SVM) is a powerful supervised learning method that appears in virtually every classic machine‑learning textbook and is a frequent topic in technical interviews. The first interview question about SVM typically tests understanding of the model’s derivation.
Problem
Given two linearly separable classes of points in space, after projecting each point onto the SVM decision hyperplane, are the projected points still linearly separable? Provide a proof of your view.
Answer and Analysis
Linear separability means a hyperplane can completely separate the two classes. In the diagram below the blue line represents the SVM decision hyperplane that separates the red and green point clouds. Projecting the points onto this hyperplane (as shown on the right) places them on the hyperplane itself.
At first glance the projected points appear not to be linearly separable. A simple counter‑example uses only two points, one from each class. The SVM hyperplane is the perpendicular bisector of the segment joining the two points; after projection both points collapse onto the same location on the hyperplane, making separation impossible.
In fact, for any linearly separable dataset, the projections onto the SVM hyperplane are always linearly inseparable. A proof by contradiction considers the case where only support vectors exist. Assuming the projected points are still separable would imply the existence of a hyperplane (the dashed blue line) that separates them better than the original SVM hyperplane, contradicting the optimality of the SVM solution.
The proof can be tightened using the KKT conditions of the SVM dual problem. From the KKT conditions, any data point with a non‑zero Lagrange multiplier must be a support vector; all non‑support vectors have multiplier zero. Consequently, the classification decision depends only on support vectors, which explains SVM’s computational efficiency.
Alternatively, the separating hyperplane theorem from convex optimization offers a concise argument. The SVM hyperplane is the perpendicular bisector of the shortest segment connecting the convex hulls of the two classes. Projecting each convex hull onto this hyperplane collapses the shortest segment, so the projected sets cannot be separated by any hyperplane.
For interviewees, presenting the geometric contradiction is sufficient; mentioning the convex‑optimization perspective demonstrates deeper understanding.
Further Reading
SVM derivation notes: http://cs229.stanford.edu/notes/cs229-notes3.pdf
Dual problem and KKT conditions: http://stanford.edu/class/ee364a/lectures/duality.pdf
Separating hyperplane theorem: http://www.princeton.edu/~amirali/Public/Teaching/ORF523/S16/ORF523_S16_Lec5_gh.pdf
Hulu Beijing
Follow Hulu's official WeChat account for the latest company updates and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.