Four Numeric Scaling Techniques: When to Use Standard, Robust, Power, and Min‑Max
This article explains why numeric feature engineering is essential for machine‑learning models, outlines the two main challenges of differing magnitudes and outliers, and demonstrates four scaling methods—StandardScaler, RobustScaler, PowerTransformer, and MinMaxScaler—using the California housing dataset, complete with code, visualizations, and guidance on when each method is appropriate.
Numeric feature engineering is a mandatory preprocessing step for training machine‑learning models. Two core issues arise when handling numeric data: differences in feature magnitude and the presence of outliers. For example, age and salary can differ by several orders of magnitude, causing models to overweight larger‑scale features.
Skewed distributions also pose problems; a feature like number of siblings may have most values between 0‑2 but occasional extreme values (8 or 10) that pull the distribution. While such extreme samples often contain valuable information, they cannot be simply discarded.
The article introduces four common scaling techniques to address these issues: Standardization, Robust scaling, Power transformation, and Normalization. The implementations are demonstrated with scikit‑learn’s California housing dataset, focusing on the "Median Income" and "Population" features, which have markedly different scales.
dataset = fetch_california_housing()
X_full, y_full = dataset.data, dataset.target
feature_names = dataset.feature_names
import pandas as pd
df = pd.DataFrame({
"MedInc": X_full[:, 0],
"Population": X_full[:, 4],
})
df.describe()First, the raw data and a version with outliers removed (0‑99 percentile) are visualized with scatter plots.
X = X_full[:, [0,4]]
outlier_range = (0, 99)
cutoffs_median_inc = np.percentile(X[:, 0], outlier_range)
cutoffs_population = np.percentile(X[:, 1], outlier_range)
non_outliers = np.all(X > [cutoffs_median_inc[0], cutoffs_population[0]], axis=1) & np.all(
X < [cutoffs_median_inc[1], cutoffs_population[1]], axis=1)
non_outlier_X = X[non_outliers]
non_outliers_Y = y_full[non_outliers]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
fig.suptitle('Original Data')
ax1.set_title('Full Data')
ax1.scatter(X[:, 0], X[:, 1], c=y_full)
ax1.set_xlabel('MedInc')
ax1.set_ylabel('Population')
ax2.set_title('Non-outlier Data')
ax2.scatter(non_outlier_X[:, 0], non_outlier_X[:, 1], c=non_outliers_Y)
ax2.set_xlabel('MedInc')
ax2.set_ylabel('Population')
plt.show()Standardization (StandardScaler)
Standardization rescales features to zero mean and unit variance using the formula z = (x — mean) / standard_deviation. This makes features comparable and aligns well with algorithms that assume normally distributed inputs (e.g., linear regression, logistic regression, SVM, PCA).
from sklearn.preprocessing import StandardScaler
standard_scaler = StandardScaler()
standardized_x = standard_scaler.fit_transform(X)After scaling, "Population" (original range 0‑35k) and "MedInc" (0‑14) are mapped to roughly [0, 35] and [‑2, 6] respectively, making them comparable. However, StandardScaler is highly sensitive to outliers; extreme values inflate the mean, compressing most data into a narrow interval (‑1 to 4) while leaving the distribution shape unchanged.
Robust Scaling (RobustScaler)
RobustScaler replaces mean and standard deviation with median and inter‑quartile range (IQR), thus ignoring the outer 25 % of data on each side. Outliers remain in the data but do not dominate the scaling.
from sklearn.preprocessing import RobustScaler
robust_scaler = RobustScaler(quantile_range=(25.0,75.0), with_scaling=True, with_centering=True, unit_variance=True)
robust_x = robust_scaler.fit_transform(X)The main bodies of both features now lie in similar intervals (MedInc ≈ [‑2, 5], Population ≈ [‑2, 6]). Like StandardScaler, RobustScaler does not eliminate skewness; a non‑linear transformation is needed for that.
Power Transformation (PowerTransformer)
PowerTransformer (or QuantileTransformer) compresses long tails, pulling extreme values toward the data core and reshaping a skewed distribution into a near‑normal shape. This retains outlier information without letting extreme values dominate the model.
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='yeo-johnson')
pt_transformed = pt.fit_transform(X[:,[1]])Box‑plot visualizes the long tail of "Population" before transformation.
After applying PowerTransformer, the histogram becomes bell‑shaped and the box‑plot shows a centered box with symmetric whiskers, indicating that outliers have been pulled into the main data region.
import matplotlib.pyplot as plt
import seaborn as sns
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
sns.histplot(standardized_x[:,1], ax=ax1)
ax1.set_title("Before: Standardized Population (Skewed)")
sns.histplot(pt_transformed[:,0], ax=ax2)
ax2.set_title("After: PowerTransformed Population (Normal‑like)")
plt.tight_layout()
plt.show()Comparing StandardScaler and PowerTransformer on "Population" shows that the latter reduces the distance of extreme values, leading to a more uniform error distribution for linear models.
Normalization (Min‑Max Scaling)
Normalization rescales all values to the 0‑1 range. It is crucial for distance‑based algorithms such as K‑Nearest Neighbors and helps neural networks avoid activation saturation.
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
normalize_x = min_max_scaler.fit_transform(X)Min‑Max scaling maps the maximum "Population" value to 1.0, compressing the majority of data (1000‑2000) into a tiny 0‑0.16 interval, which destroys resolution. It works best when feature bounds are known and fixed, e.g., RGB pixel values (0‑255).
The following table summarizes the preferred scaler for each issue:
.----------------------.---------------------------.-------------------------------------------------------.
| Issue | Best Tool | Why? |
:----------------------+---------------------------+-------------------------------------------------------:
| Different Scales | StandardScaler | Makes features comparable. |
:----------------------+---------------------------+-------------------------------------------------------:
| Heavy Skew | Power/QuantileTransformer | Normalizes the distribution shape. |
:----------------------+---------------------------+-------------------------------------------------------:
| Extreme Outliers | RobustScaler | Uses Median and IQR, unaffected by marginal outliers. |
:----------------------+---------------------------+-------------------------------------------------------:
| Neural Network Input | Min‑Max Scaler | Matches the "expected" range of neurons. |
'----------------------'---------------------------'-------------------------------------------------------'When using these scalers, remember the strict rule: .fit() computes statistics (mean, std, etc.) and must be called only on training data. .transform() applies the learned statistics to training, validation, test, or production data. Fitting on test data causes data leakage.
By following these guidelines, practitioners can choose the appropriate scaling technique to handle magnitude differences, skewness, and outliers, thereby improving model performance and stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DeepHub IMBA
A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
