Artificial Intelligence 6 min read

Essential Machine Learning Visuals: Test Error, Overfitting, and More

This article presents a curated collection of insightful machine‑learning diagrams that illustrate key concepts such as test versus training error, under‑ and over‑fitting, Occam’s razor, feature interactions, irrelevant features, basis functions, discriminative versus generative models, loss functions, least‑squares geometry, and sparsity.

MaGe Linux Operations

Apr 17, 2017

Essential Machine Learning Visuals: Test Error, Overfitting, and More

When explaining basic machine‑learning concepts, I often return to a few illustrative diagrams. Below is a list of the most insightful ones.

Test and training error

Why low training error is not always desirable: the figure shows test and training error curves as model complexity varies.

Under and overfitting

Examples of under‑fitting and over‑fitting. The polynomial curves of varying degree (M) are shown in red, while the green curve fits the data set.

Occam’s razor

The diagram explains how Bayesian inference embodies Occam’s razor: a complex model occupies a smaller region of the data‑set space, making it a lower‑probability hypothesis compared with a simpler model.

Feature combinations

Why features that appear unrelated individually can become crucial when combined, and why linear methods may fail. (Illustrated from Isabelle Guyon’s feature‑extraction slides.)

Irrelevant features

How irrelevant features degrade K‑NN, clustering, and other similarity‑based methods. The right‑hand plot adds an unrelated axis that breaks the grouping and creates many false neighbours.

Basis functions

Non‑linear basis functions transform a low‑dimensional non‑linear classification problem into a high‑dimensional linear one. For example, mapping x to (x, x²) makes a previously non‑linear problem linearly separable.

Discriminative vs. Generative

Why discriminative learning is often simpler: the left diagram shows class‑conditional density p(x|C₁), which does not affect the posterior, while the right diagram shows the decision boundary (green line) that minimizes misclassification.

Loss functions

Learning algorithms can be viewed as optimizing different loss functions. The figure shows the hinge loss used in SVMs (blue), the logistic loss (red) after scaling by 1/ln 2, the misclassification loss (black), and the mean‑squared error (green).

Geometry of least squares

The diagram shows the N‑dimensional geometry of a least‑squares regression with two predictors. The response vector y is orthogonally projected onto the plane spanned by input vectors x₁ and x₂.

Sparsity

Why the Lasso (L₁ regularization) yields sparse solutions with many zero coefficients, while ridge regression does not. The left plot shows the Lasso estimate, the right plot shows ridge; the blue region represents the constraint |β₁|+|β₂|≤t (Lasso) or β₁²+β₂²≤t² (ridge).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

overfitting feature selection sparsity Loss Functions model complexity Occam's razor

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.