Artificial Intelligence 7 min read

Feature Selection and Feature Engineering with Python (Filter, Wrapper, and Embedded Methods)

This tutorial teaches how to perform feature selection using filter, wrapper, and embedded methods and how to construct new features such as interaction, non‑linear, binned, and binary features with Python's pandas and scikit‑learn libraries.

Test Development Learning Exchange

Nov 22, 2024

Feature Selection and Feature Engineering with Python (Filter, Wrapper, and Embedded Methods)

Goal : Learn feature selection and feature construction techniques.

Learning Content : Filter, wrapper, and embedded feature selection methods; various feature construction strategies.

Code Example :

import pandas as pd

import numpy as np

from sklearn.feature_selection import SelectKBest, f_classif, RFE, SelectFromModel

from sklearn.linear_model import LogisticRegression

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import make_classification

# Create example dataset

X, y = make_classification(n_samples=100, n_features=10, n_informative=5, n_redundant=5, random_state=42)

feature_names = [f'特征{i+1}' for i in range(X.shape[1])]

df = pd.DataFrame(X, columns=feature_names)

df['标签'] = y

print(f"示例数据集: 
{df.head()}")

# Filter method: SelectKBest with f_classif

selector = SelectKBest(score_func=f_classif, k=5)

X_new = selector.fit_transform(X, y)

selected_features = [feature_names[i] for i in selector.get_support(indices=True)]

print(f"使用 SelectKBest 选择的特征: {selected_features}")

# Wrapper method: Recursive Feature Elimination (RFE)

model = LogisticRegression()

selector = RFE(model, n_features_to_select=5, step=1)

selector.fit(X, y)

selected_features = [feature_names[i] for i in selector.support_]

print(f"使用 RFE 选择的特征: {selected_features}")

# Embedded method: Random Forest feature importance

model = RandomForestClassifier()

model.fit(X, y)

importances = model.feature_importances_

indices = np.argsort(importances)[-5:]  # select top 5 features

selected_features = [feature_names[i] for i in indices]

print(f"使用随机森林选择的特征: {selected_features}")

# Feature construction examples

# Interaction feature: product of feature1 and feature2

df['特征1_特征2_乘积'] = df['特征1'] * df['特征2']

print(f"创建新特征后的数据集: 
{df.head()}")

# Interaction feature: sum of feature1 and feature3

df['特征1_特征3_和'] = df['特征1'] + df['特征3']

# Non‑linear feature: square of feature2

df['特征2_平方'] = df['特征2'] ** 2

# Binned feature: binning feature1

bins = [0, 0.5, 1, 1.5, 2]

labels = ['低', '中低', '中高', '高']

df['特征1_分箱'] = pd.cut(df['特征1'], bins=bins, labels=labels)

# Binary feature: binarize feature3 based on its mean

df['特征3_二值化'] = (df['特征3'] > df['特征3'].mean()).astype(int)

print(f"创建二值化特征后的数据集: 
{df.head()}")

Practice : Apply the above feature selection and construction steps to any dataset.

Summary : After completing the exercises, you should be able to select important features using filter, wrapper, and embedded methods and enrich your dataset by creating new, interaction, non‑linear, binned, and binary features.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python feature selection Scikit-learn

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.