Fundamentals 6 min read

Practical Data Sampling Techniques and Code Examples for Various Business Scenarios

This article presents ten real‑world business scenarios illustrating data sampling methods such as random, stratified, time‑window, sliding‑window, keyword, group, interval, click‑based, and weight‑based sampling, each accompanied by clear Python pandas code examples.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Practical Data Sampling Techniques and Code Examples for Various Business Scenarios

Data sampling is a commonly used technique in data analysis that can improve efficiency while preserving representativeness of the original data.

1. E‑commerce scenario: Randomly sample a subset of user purchase records for analysis.

import pandas as pd
# Read purchase records
purchase_data = pd.read_csv('purchase_data.csv')
# Randomly sample 1,000 records
sample_data = purchase_data.sample(n=1000)

2. Market research scenario: Use stratified sampling based on respondent attributes such as gender.

# Read survey data
survey_data = pd.read_csv('survey_data.csv')
# Stratified sampling by gender
male_data = survey_data[survey_data['Gender'] == 'Male'].sample(n=500)
female_data = survey_data[survey_data['Gender'] == 'Female'].sample(n=500)

3. Healthcare scenario: Apply a time‑window sample to select records within a specific year.

# Read medical records
medical_data = pd.read_csv('medical_data.csv')
# Time‑window sampling for 2022
sample_data = medical_data[(medical_data['Date'] >= '2022-01-01') & (medical_data['Date'] <= '2022-12-31')]

4. Finance scenario: Use a sliding‑window sample to take the most recent 30 days of stock trading data.

# Read stock data
stock_data = pd.read_csv('stock_data.csv')
# Sliding‑window sample: last 30 days
sample_data = stock_data.tail(30)

5. Social media scenario: Keyword sampling selects comments containing a specific keyword.

# Read comment data
comment_data = pd.read_csv('comment_data.csv')
# Keyword sampling for "好评"
sample_data = comment_data[comment_data['Content'].str.contains('好评')]

6. Human resources scenario: Group sampling picks a proportion of employees from each department.

# Read performance data
performance_data = pd.read_csv('performance_data.csv')
# Group sampling: 10% from each department
sample_data = performance_data.groupby('Department').apply(lambda x: x.sample(frac=0.1))

7. Education scenario: Stratified random sampling selects a percentage of students from each grade.

# Read exam scores
exam_scores = pd.read_csv('exam_scores.csv')
# Stratified random sampling: 20% per grade
sample_data = exam_scores.groupby('Grade').apply(lambda x: x.sample(frac=0.2))

8. Hotel scenario: Interval sampling picks records at regular time intervals (e.g., every week).

# Read booking data
booking_data = pd.read_csv('booking_data.csv')
# Interval sampling: every 7th record (weekly)
sample_data = booking_data[::7]

9. Marketing scenario: Click‑based sampling selects ads with the highest number of clicks.

# Read ad data
ad_data = pd.read_csv('ad_data.csv')
# Click‑based sampling: top 100 ads by clicks
sample_data = ad_data.nlargest(100, 'Clicks')

10. Logistics scenario: Weight‑based sampling chooses shipments with the largest weight, using a quantile threshold.

# Read shipment data
shipment_data = pd.read_csv('shipment_data.csv')
# Determine 90th percentile weight threshold
threshold = shipment_data['Weight'].quantile(0.9)
# Weight‑based sampling: shipments >= threshold
sample_data = shipment_data[shipment_data['Weight'] >= threshold]

These examples demonstrate how different sampling strategies can reduce data volume, speed up analysis, and maintain the representativeness of the original dataset across diverse business domains.

Pythonpandasbusiness analyticsdata samplingsampling techniques
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.