Feature Engineering for Structured Data: Normalization, Encoding & Interaction

This article explains the fundamentals of feature engineering for structured data, covering why and how to normalize numerical features, various categorical encoding techniques, methods for creating high‑dimensional interaction features, and decision‑tree based strategies for efficiently discovering valuable feature combinations.

Hulu Beijing
Hulu Beijing
Hulu Beijing
Feature Engineering for Structured Data: Normalization, Encoding & Interaction

Scene Description

Feature engineering involves finding effective features for a given problem and transforming them into a suitable input format for models. The classic “Garbage in, garbage out” principle highlights that model performance depends not only on algorithm choice but also on quality of input features.

Problem Description

Why normalize numerical features?

How to handle categorical features?

How to process high‑dimensional interaction features?

How to efficiently discover useful feature combinations?

Answer and Analysis

1. Why normalize numerical features?

Normalization scales all numerical features to a similar range, commonly using z‑score normalization (subtract mean μ and divide by standard deviation σ ). This prevents features with larger ranges from dominating gradient‑based optimization such as stochastic gradient descent.

For example, a feature x 1 ranging [0,10] and x 2 ranging [0,3] will have different update speeds under the same learning rate. After scaling both to the same interval, their updates become comparable, allowing faster convergence.

Normalization illustration
Normalization illustration

Normalization is required for models trained with gradient descent (linear regression, logistic regression, SVM, neural networks) but not for decision‑tree based models like C4.5, where split decisions depend on order rather than absolute values.

2. How to handle categorical features?

Categorical features (e.g., gender, blood type) are typically represented as strings. Models such as logistic regression or linear SVM require numeric input, so conversion is necessary. Common encoding methods include:

Ordinal Encoding : Assign integer IDs preserving an inherent order.

One‑hot Encoding : Represent each category as a sparse binary vector.

Binary Encoding : Encode ordinal IDs in binary form, reducing dimensionality compared to one‑hot.

These methods help save space, enable sparse vector handling, and can be combined with feature selection to mitigate over‑fitting in high‑dimensional settings.

Binary encoding diagram
Binary encoding diagram

3. How to process high‑dimensional interaction features?

Interaction features are created by combining pairs of discrete features, increasing model capacity to capture complex relationships. However, naïve combination of high‑cardinality ID features leads to an explosion of parameters (e.g., m × n for user‑item IDs).

Dimensionality reduction techniques such as low‑rank factorization represent each entity with a k‑dimensional vector (k ≪ m, k ≪ n), reducing the parameter count to m·k + n·k and providing a matrix‑factorization perspective.

Low‑rank factorization
Low‑rank factorization

Example interaction creation: combining language and type features for ad click prediction, illustrated in the following figures.

Feature interaction example 1
Feature interaction example 1
Feature interaction example 2
Feature interaction example 2

4. How to efficiently find useful feature combinations?

A decision‑tree based approach can discover valuable feature combinations. By building gradient‑boosted decision trees, each root‑to‑leaf path represents a combination of conditions (e.g., age ≤ 35 & gender = female). These binary encodings can be used as new interaction features.

Decision tree feature combos
Decision tree feature combos

References provide further reading on encoding methods, click‑prediction case studies, and gradient boosting machines.

feature engineeringNormalizationcategorical encodinginteraction features
Hulu Beijing
Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.