Artificial Intelligence 18 min read

11 Powerful Feature Selection Techniques Every Data Scientist Should Master

This guide walks through a comprehensive set of feature‑selection strategies—from removing unused or missing columns to handling multicollinearity, low‑variance features, and using PCA—complete with Python code examples and visualizations to help you build leaner, more interpretable machine‑learning models.

MaGe Linux Operations

Oct 1, 2022

11 Powerful Feature Selection Techniques Every Data Scientist Should Master

Too many features increase model complexity and over‑fitting, while too few lead to under‑fitting. Feature selection aims to keep the model just complex enough to generalize well while remaining easy to train, maintain, and interpret.

Feature selection means retaining some features and discarding others. This article outlines several feature‑selection strategies:

Delete unused columns

Delete columns with missing values

Remove irrelevant features

Drop low‑variance features

Handle multicollinearity

Use feature coefficients (beta values)

Apply p‑values for statistical significance

Calculate Variance Inflation Factor (VIF)

Select features based on importance (tree‑based models)

Automatic selection with scikit‑learn

Principal Component Analysis (PCA)

1. Delete Unused Columns

The simplest strategy is intuition: if you know a column (e.g., ID, FirstName) will never be used, drop it. In the demo dataset no such columns exist, so none are removed.

2. Delete Columns with Missing Values

Missing values are unacceptable for most ML algorithms. If a column has many missing entries, it is often best to drop it entirely.

# total null values per column
df.isnull().sum()

3. Irrelevant Features

Features must be correlated with the target. For numeric features, correlation can be visualized with a bar plot.

# correlation between target and features
(df.corr().loc['price'].plot(kind='barh', figsize=(4,10)))

Features such as peak‑rpm , compression‑ratio , stroke , bore , and symboling show almost no correlation with price and can be removed. A correlation threshold (e.g., 0.2) can be applied programmatically:

# drop uncorrelated numeric features (threshold < 0.2)
corr = abs(df.corr().loc['price'])
corr = corr[corr < 0.2]
cols_to_drop = corr.index.tolist()
df = df.drop(cols_to_drop, axis=1)

4. Low‑Variance Features

Check the variance of numeric features and drop those with extremely low variance.

# variance of numeric features
(df.select_dtypes(include='number').var().astype('str'))

The feature bore has very low variance but is kept for demonstration.

df['bore'].describe()

5. Multicollinearity

When two features are highly correlated, they introduce multicollinearity. For example, engine size and horsepower are strongly related. A heatmap can reveal such relationships.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(16,10)})
sns.heatmap(df.corr(), annot=True, linewidths=.5, center=0, cbar=False, cmap="PiYG")
plt.show()

Features with correlation > 0.80 can be manually or programmatically removed:

# drop correlated features
df = df.drop(['length', 'width', 'curb-weight', 'engine-size', 'city-mpg'], axis=1)

Variance Inflation Factor (VIF) can also be used to detect multicollinearity.

from statsmodels.stats.outliers_influence import variance_inflation_factor
vif = pd.Series([variance_inflation_factor(df.values, i) for i in range(df.shape[1])], index=df.columns)

6. Feature Coefficients

For regression tasks, the magnitude of coefficients (beta values) indicates each feature’s contribution. After fitting a linear model, coefficients can be visualized and small‑magnitude features filtered out.

# feature coefficients
coeffs = model.coef_
index = X_train.columns.tolist()
(pd.DataFrame(coeffs, index=index, columns=['coeff']).sort_values(by='coeff')
 .plot(kind='barh', figsize=(4,10)))

# filter near‑zero coefficient features
temp = pd.DataFrame(coeffs, index=index, columns=['coeff']).sort_values(by='coeff')
temp = temp[(temp['coeff']>1) | (temp['coeff']<-1)]
cols_coeff = temp.index.tolist()
X_train = X_train[cols_coeff]
X_test = X_test[cols_coeff]

7. p‑Values

In regression, p‑values assess whether a predictor is statistically significant. Using statsmodels:

import statsmodels.api as sm
ols = sm.OLS(y, X).fit()
print(ols.summary())

Features with non‑significant p‑values can be removed iteratively to improve adjusted R².

8. Variance Inflation Factor (VIF)

VIF quantifies multicollinearity. Rough guidelines: VIF=1 (no correlation), 1‑5 (moderate), >5 (high). Features with VIF > 10 are dropped.

# calculate VIF
vif = pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)

9. Feature‑Importance‑Based Selection

Tree‑based models provide feature_importances_. A random forest can be trained and importance visualized.

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=200, random_state=0)
model.fit(X, y)
importances = model.feature_importances_
(pd.DataFrame(importances, X.columns, columns=['importance']).sort_values(by='importance', ascending=True).plot(kind='barh', figsize=(4,10))

Standard deviation across trees can be added as error bars.

std = np.std([i.feature_importances_ for i in model.estimators_], axis=0)
feat_with_importance = pd.Series(importances, X.columns)
fig, ax = plt.subplots(figsize=(12,5))
feat_with_importance.plot.bar(yerr=std, ax=ax)
ax.set_title("Feature importances")
ax.set_ylabel("Mean decrease in impurity")

10. Automatic Feature Selection with Scikit‑Learn

Scikit‑learn offers several wrappers:

SelectKBest / chi‑square

SelectPercentile

SelectFromModel (e.g., L1‑regularized LinearSVC)

SequentialFeatureSelector (forward/backward)

# select K best features (chi2)
X_best = SelectKBest(chi2, k=10).fit_transform(X, y)
# keep top 75% of features
X_top = SelectPercentile(chi2, percentile=75).fit_transform(X, y)

# L1‑regularized LinearSVC + SelectFromModel
from sklearn.svm import LinearSVC
model = LinearSVC(penalty='l1', C=0.002, dual=False)
model.fit(X, y)
selector = SelectFromModel(estimator=model, prefit=True)
X_new = selector.transform(X)
feature_names = np.array(X.columns)
selected = feature_names[selector.get_support()]

# backward sequential selection with RandomForest
model = RandomForestClassifier(n_estimators=100, random_state=0)
selector = SequentialFeatureSelector(estimator=model, n_features_to_select=10, direction='backward', cv=2)
selector.fit_transform(X, y)
feature_names = np.array(X.columns)
selected = feature_names[selector.get_support()]

11. Principal Component Analysis (PCA)

PCA reduces dimensionality by projecting data onto orthogonal components that capture most variance.

from sklearn.decomposition import PCA
X_scaled = scaler.fit_transform(X)
pca = PCA()
pca.fit(X_scaled)
evr = pca.explained_variance_ratio_
plt.figure(figsize=(12,5))
plt.plot(range(len(evr)), evr.cumsum(), marker='o', linestyle='--')
plt.xlabel('Number of components')
plt.ylabel('Cumulative explained variance')

In the demo, 20 components explain over 80 % of variance, so the model can be trained on these 20 principal components.

Summary

This guide provides a useful overview of various feature‑selection techniques. Before fitting a model, you can drop columns with many missing values, irrelevant or highly collinear features, and apply dimensionality reduction with PCA. After a baseline model is built, you can further prune features using coefficients, p‑values, VIF, and importance scores. While you won’t use every strategy in a single project, these methods give you a solid toolbox for creating efficient, interpretable models.

Source: DeepHub IMBA

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python Data preprocessing feature selection Scikit-learn dimensionality reduction

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.