Feature Selection and Feature Engineering with Python (Filter, Wrapper, and Embedded Methods)

This tutorial teaches how to perform feature selection using filter, wrapper, and embedded methods and how to construct new features such as interaction, non‑linear, binned, and binary features with Python's pandas and scikit‑learn libraries.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Feature Selection and Feature Engineering with Python (Filter, Wrapper, and Embedded Methods)

Goal : Learn feature selection and feature construction techniques.

Learning Content : Filter, wrapper, and embedded feature selection methods; various feature construction strategies.

Code Example :

import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest, f_classif, RFE, SelectFromModel
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# Create example dataset
X, y = make_classification(n_samples=100, n_features=10, n_informative=5, n_redundant=5, random_state=42)
feature_names = [f'特征{i+1}' for i in range(X.shape[1])]
df = pd.DataFrame(X, columns=feature_names)
df['标签'] = y
print(f"示例数据集: 
{df.head()}")
# Filter method: SelectKBest with f_classif
selector = SelectKBest(score_func=f_classif, k=5)
X_new = selector.fit_transform(X, y)
selected_features = [feature_names[i] for i in selector.get_support(indices=True)]
print(f"使用 SelectKBest 选择的特征: {selected_features}")
# Wrapper method: Recursive Feature Elimination (RFE)
model = LogisticRegression()
selector = RFE(model, n_features_to_select=5, step=1)
selector.fit(X, y)
selected_features = [feature_names[i] for i in selector.support_]
print(f"使用 RFE 选择的特征: {selected_features}")
# Embedded method: Random Forest feature importance
model = RandomForestClassifier()
model.fit(X, y)
importances = model.feature_importances_
indices = np.argsort(importances)[-5:]  # select top 5 features
selected_features = [feature_names[i] for i in indices]
print(f"使用随机森林选择的特征: {selected_features}")
# Feature construction examples
# Interaction feature: product of feature1 and feature2
df['特征1_特征2_乘积'] = df['特征1'] * df['特征2']
print(f"创建新特征后的数据集: 
{df.head()}")
# Interaction feature: sum of feature1 and feature3
df['特征1_特征3_和'] = df['特征1'] + df['特征3']
# Non‑linear feature: square of feature2
df['特征2_平方'] = df['特征2'] ** 2
# Binned feature: binning feature1
bins = [0, 0.5, 1, 1.5, 2]
labels = ['低', '中低', '中高', '高']
df['特征1_分箱'] = pd.cut(df['特征1'], bins=bins, labels=labels)
# Binary feature: binarize feature3 based on its mean
df['特征3_二值化'] = (df['特征3'] > df['特征3'].mean()).astype(int)
print(f"创建二值化特征后的数据集: 
{df.head()}")

Practice : Apply the above feature selection and construction steps to any dataset.

Summary : After completing the exercises, you should be able to select important features using filter, wrapper, and embedded methods and enrich your dataset by creating new, interaction, non‑linear, binned, and binary features.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonfeature selectionscikit-learn
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.