Feature Engineering Techniques for Various Business Scenarios with Python Code Examples
This article presents practical feature‑engineering methods for ten common business domains, explaining the purpose of each feature, the extraction technique, and providing ready‑to‑run Python code snippets to help build more accurate predictive models.
Feature engineering is a crucial step in data analysis that transforms raw data into informative features to improve model performance.
1. E‑commerce scenario: Feature: order date; Technique: extract year, quarter, month, and weekday.
import pandas as pd
# Read order data
orders = pd.read_csv('orders.csv')
# Extract year
orders['Year'] = pd.to_datetime(orders['OrderDate']).dt.year
# Extract quarter
orders['Quarter'] = pd.to_datetime(orders['OrderDate']).dt.quarter
# Extract month
orders['Month'] = pd.to_datetime(orders['OrderDate']).dt.month
# Extract weekday
orders['Weekday'] = pd.to_datetime(orders['OrderDate']).dt.weekday2. Market research scenario: Feature: text data; Technique: convert text to numeric vectors using bag‑of‑words or TF‑IDF.
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
# Read survey data
survey_data = pd.read_csv('survey_data.csv')
# Extract text feature
text_data = survey_data['Response']
vectorizer = CountVectorizer()
text_features = vectorizer.fit_transform(text_data)3. Healthcare scenario: Feature: patient age; Technique: calculate age from birth date.
import pandas as pd
import datetime
# Read patient data
patient_data = pd.read_csv('patient_data.csv')
# Calculate age
current_year = datetime.datetime.now().year
patient_data['Age'] = current_year - pd.to_datetime(patient_data['BirthDate']).dt.year4. Finance scenario: Feature: time‑series data; Technique: create lag features such as previous close price and 7‑day average volume.
import pandas as pd
# Read stock data
stock_data = pd.read_csv('stock_data.csv')
# Extract lag feature
stock_data['PreviousClose'] = stock_data['Close'].shift(1)
stock_data['AverageVolume'] = stock_data['Volume'].rolling(window=7).mean()5. Social media scenario: Feature: user registration date; Technique: compute usage duration as days since registration.
import pandas as pd
# Read user data
user_data = pd.read_csv('user_data.csv')
# Convert registration date
user_data['RegistrationDate'] = pd.to_datetime(user_data['RegistrationDate'])
# Calculate usage duration
user_data['UsageDuration'] = (pd.Timestamp.now() - user_data['RegistrationDate']).dt.days6. Human resources scenario: Feature: employee hire date; Technique: extract month and quarter of hiring.
import pandas as pd
# Read employee data
employee_data = pd.read_csv('employee_data.csv')
# Extract month and quarter
employee_data['Month'] = pd.to_datetime(employee_data['HireDate']).dt.month
employee_data['Quarter'] = pd.to_datetime(employee_data['HireDate']).dt.quarter7. Education scenario: Feature: student exam scores; Technique: compute average score and standard deviation per student.
import pandas as pd
# Read exam scores
exam_scores = pd.read_csv('exam_scores.csv')
# Calculate average and std deviation
exam_scores['AverageScore'] = exam_scores.mean(axis=1)
exam_scores['ScoreStd'] = exam_scores.std(axis=1)8. Hotel scenario: Feature: booking and check‑in dates; Technique: calculate days in advance of booking.
import pandas as pd
# Read booking data
booking_data = pd.read_csv('booking_data.csv')
# Convert dates
booking_data['BookingDate'] = pd.to_datetime(booking_data['BookingDate'])
booking_data['CheckInDate'] = pd.to_datetime(booking_data['CheckInDate'])
# Calculate days in advance
booking_data['DaysInAdvance'] = (booking_data['CheckInDate'] - booking_data['BookingDate']).dt.days9. Marketing scenario: Feature: ad clicks and impressions; Technique: compute click‑through rate.
import pandas as pd
# Read ad data
ad_data = pd.read_csv('ad_data.csv')
# Calculate CTR
ad_data['ClickThroughRate'] = ad_data['Clicks'] / ad_data['Impressions']10. Logistics scenario: Feature: weight and volume of shipments; Technique: calculate density as weight divided by volume.
import pandas as pd
# Read shipment data
shipment_data = pd.read_csv('shipment_data.csv')
# Calculate density
shipment_data['Density'] = shipment_data['Weight'] / shipment_data['Volume']The goal of feature engineering is to derive useful attributes from raw data to build more accurate and effective predictive models; the appropriate technique depends on the specific business context and data characteristics.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.