Dimensionality Reduction Algorithms: Why Too Many Features Hurt Machine Learning

The article explains how high‑dimensional data causes the curse of dimensionality, reduces model performance, and surveys feature‑selection, matrix‑decomposition, manifold‑learning, and auto‑encoder techniques while advising systematic experiments and proper data scaling.

Code DAO
Code DAO
Code DAO
Dimensionality Reduction Algorithms: Why Too Many Features Hurt Machine Learning

This first article in the series introduces several dimensionality‑reduction algorithms and explains why an excess of input variables degrades machine‑learning performance.

When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data. This is called dimensionality reduction.

When data are represented as rows and columns, each column is a feature. A space with many dimensions has a huge volume, so each data point (row) represents a sparse, often meaningless sample. This phenomenon, known as the “curse of dimensionality,” leads to models with many free parameters that may over‑fit and fail to generalize.

Reducing the number of features therefore reduces the dimensionality of the feature space, simplifying the model and improving generalisation.

The fundamental reason for the curse of dimensionality is that high‑dimensional functions have the potential to be much more complicated than low‑dimensional ones, and that those complications are harder to discern. The only way to beat the curse is to incorporate knowledge about the data that is correct.

Dimensionality reduction is a data‑pre‑processing step performed after cleaning and scaling but before training a predictive model.

Two major categories of feature‑selection methods are described:

Wrapper methods – wrap a learning model, evaluate different subsets of features, and select the subset that yields the best performance (e.g., Recursive Feature Elimination).

Filter methods – use statistical scores such as Pearson correlation or chi‑square tests to pick the most predictive features.

… perform feature selection, to remove “irrelevant” features that do not help much with the classification problem.

Matrix‑decomposition techniques from linear algebra can also reduce dimensionality. By decomposing the data matrix (e.g., eigen‑decomposition or Singular Value Decomposition), one can rank components and keep a subset that best captures the data structure. The most common approach is Principal Component Analysis (PCA).

The most common approach to dimensionality reduction is called principal components analysis or PCA.

Manifold‑learning methods create low‑dimensional projections that preserve the intrinsic structure of high‑dimensional data, often for visualization. Examples include Kohonen Self‑Organizing Maps (SOM), Sammon mapping, Multidimensional Scaling (MDS), and t‑Distributed Stochastic Neighbor Embedding (t‑SNE).

In mathematics, a projection is a kind of function or mapping that transforms data in some way.

Auto‑encoders, a type of unsupervised deep neural network, learn to compress data into a bottleneck layer (the encoder) and then reconstruct the input (the decoder). After training, the decoder is discarded and the bottleneck output serves as a reduced‑dimensional representation that can be fed to any downstream model.

An auto‑encoder is a kind of unsupervised neural network that is used for dimensionality reduction and feature discovery. More precisely, an auto‑encoder is a feedforward neural network that is trained to predict the input itself.
Deep autoencoders are an effective framework for nonlinear dimensionality reduction. Once such a network has been built, the top‑most layer of the encoder, the code layer hc, can be input to a supervised classification procedure.

The encoder’s output is a projection that, like other projections, has little direct correspondence to the original variables, resulting in low interpretability.

There is no universally best dimensionality‑reduction technique; the optimal choice depends on the specific model and dataset. Systematic, controlled experiments are recommended to discover which technique yields the best performance for a given problem. Because many linear‑algebra and manifold methods assume features share the same scale or distribution, normalizing or standardizing data before applying these techniques is good practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PCAfeature selectiondimensionality reductionautoencodersmanifold learning
Code DAO
Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.