Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn
This article explains why numeric feature engineering is essential for machine learning, outlines the challenges of differing scales and outliers, and demonstrates four preprocessing methods—Standardization, Robust Scaler, Power Transformer, and Normalization—using the California housing dataset with detailed code examples and visual analysis.
Numeric feature engineering is a crucial preprocessing step in machine learning, addressing two core problems: disparate feature magnitudes and outliers. Using the California housing dataset, we illustrate four common scaling techniques and discuss when each should be applied.
Standardization (StandardScaler)
Standardization transforms features to zero mean and unit variance, making them comparable for algorithms that assume a normal distribution (e.g., linear regression, SVM, PCA). It is highly sensitive to outliers.
standard_scaler = StandardScaler()
standardized_x = standard_scaler.fit_transform(X)After scaling, MedInc lies in roughly [-2, 4] and Population in [-1, 4], but extreme values still dominate the mean.
Robust Scaler
RobustScaler replaces mean and standard deviation with median and inter‑quartile range (IQR), reducing the influence of extreme outliers while keeping them in the data.
robust_scaler = RobustScaler(quantile_range=(25.0, 75.0), with_scaling=True, with_centering=True, unit_variance=True)
robust_x = robust_scaler.fit_transform(X)The main data for both features now falls into a tighter interval (e.g., MedInc ≈ [-2, 5], Population ≈ [-2, 6]).
Power Transformer
PowerTransformer (e.g., Yeo‑Johnson) compresses long tails, turning a right‑skewed distribution into a near‑normal shape while preserving the information of extreme values.
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='yeo-johnson')
pt_transformed = pt.fit_transform(X[:, [1]])Histograms before and after transformation show a clear shift from a skewed to a bell‑shaped distribution.
Normalization (Min‑Max Scaler)
Normalization rescales features to the [0, 1] interval, which is essential for distance‑based algorithms (e.g., KNN) and helps neural networks avoid saturated activations.
min_max_scaler = MinMaxScaler()
normalized_x = min_max_scaler.fit_transform(X)While Population 's maximum maps to 1.0, the majority of values are compressed into a narrow range (0‑0.16), illustrating the method’s sensitivity to extreme outliers.
When to Use Each Scaler
.----------------------.---------------------------.-------------------------------------------------------.
| Issue | Best Tool | Why? |
:----------------------+---------------------------+-------------------------------------------------------:
| Different Scales | StandardScaler | Makes features comparable. |
| Heavy Skew | Power/QuantileTransformer | Normalizes the distribution shape. |
| Extreme Outliers | RobustScaler | Uses median and IQR, unaffected by marginal outliers.|
| Neural Network Input | Min‑Max Scaler | Matches the "expected" range of neurons. |
'----------------------'---------------------------'-------------------------------------------------------'Remember to call .fit() only on training data to avoid data leakage; use .transform() on training, validation, and test sets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
