Fundamentals 11 min read

Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

This article explains why numeric feature engineering is essential for machine learning, outlines the challenges of differing scales and outliers, and demonstrates four preprocessing methods—Standardization, Robust Scaler, Power Transformer, and Normalization—using the California housing dataset with detailed code examples and visual analysis.

Data Party THU
Data Party THU
Data Party THU
Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

Numeric feature engineering is a crucial preprocessing step in machine learning, addressing two core problems: disparate feature magnitudes and outliers. Using the California housing dataset, we illustrate four common scaling techniques and discuss when each should be applied.

Standardization (StandardScaler)

Standardization transforms features to zero mean and unit variance, making them comparable for algorithms that assume a normal distribution (e.g., linear regression, SVM, PCA). It is highly sensitive to outliers.

standard_scaler = StandardScaler()
standardized_x = standard_scaler.fit_transform(X)

After scaling, MedInc lies in roughly [-2, 4] and Population in [-1, 4], but extreme values still dominate the mean.

Standardization result
Standardization result

Robust Scaler

RobustScaler replaces mean and standard deviation with median and inter‑quartile range (IQR), reducing the influence of extreme outliers while keeping them in the data.

robust_scaler = RobustScaler(quantile_range=(25.0, 75.0), with_scaling=True, with_centering=True, unit_variance=True)
robust_x = robust_scaler.fit_transform(X)

The main data for both features now falls into a tighter interval (e.g., MedInc ≈ [-2, 5], Population ≈ [-2, 6]).

Robust Scaler result
Robust Scaler result

Power Transformer

PowerTransformer (e.g., Yeo‑Johnson) compresses long tails, turning a right‑skewed distribution into a near‑normal shape while preserving the information of extreme values.

from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='yeo-johnson')
pt_transformed = pt.fit_transform(X[:, [1]])

Histograms before and after transformation show a clear shift from a skewed to a bell‑shaped distribution.

Power Transformer histogram
Power Transformer histogram

Normalization (Min‑Max Scaler)

Normalization rescales features to the [0, 1] interval, which is essential for distance‑based algorithms (e.g., KNN) and helps neural networks avoid saturated activations.

min_max_scaler = MinMaxScaler()
normalized_x = min_max_scaler.fit_transform(X)

While Population 's maximum maps to 1.0, the majority of values are compressed into a narrow range (0‑0.16), illustrating the method’s sensitivity to extreme outliers.

Min‑Max scaling result
Min‑Max scaling result

When to Use Each Scaler

.----------------------.---------------------------.-------------------------------------------------------.
|       Issue          |        Best Tool          |                         Why?                         |
:----------------------+---------------------------+-------------------------------------------------------:
| Different Scales     | StandardScaler            | Makes features comparable.                           |
| Heavy Skew           | Power/QuantileTransformer | Normalizes the distribution shape.                  |
| Extreme Outliers     | RobustScaler              | Uses median and IQR, unaffected by marginal outliers.|
| Neural Network Input  | Min‑Max Scaler            | Matches the "expected" range of neurons.            |
'----------------------'---------------------------'-------------------------------------------------------'

Remember to call .fit() only on training data to avoid data leakage; use .transform() on training, validation, and test sets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

feature scalingNormalizationscikit-learnnumeric preprocessingpower transformerrobust scaler
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.