Big Data 15 min read

What Drives Mother‑Baby E‑Commerce Sales? Data‑Driven Trends & Seasonality

Using a dataset of over 9 million baby records and 30 000 transaction entries, this analysis explores market trends, seasonal sales patterns, product popularity, and user demographics in China’s mother‑baby e‑commerce sector, revealing the impact of holidays, promotional events, and low repeat‑purchase rates on overall performance.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
What Drives Mother‑Baby E‑Commerce Sales? Data‑Driven Trends & Seasonality

Project Introduction

Project Background

Based on the PEST framework, the project briefly analyzes the market from four perspectives: Policy, Economy, Society, and Technology.

Policy : Government policies support e‑commerce development, especially in online mother‑baby goods.

Economy : Rising disposable income drives consumer demand, with the market size projected to reach 2 trillion RMB in 2015.

Society : Mobile devices enable convenient online shopping, and logistics improvements increase appeal across regions.

Technology : 4G, smartphones, and online payment systems fuel rapid growth.

Analysis Purpose

Help online merchants devise sales and operation strategies for different time periods and scenarios.

Predict product purchase preferences based on child information (age, gender) – still under development.

Problem Decomposition

Problem decomposition diagram
Problem decomposition diagram

Data Overview

The Ali_Mum_Baby dataset contains over 9 million child records (birthday and gender). Two CSV files are provided: a baby information table and a transaction history table.

Column

Description

user_id

User ID

birthday

Child's birthday

gender

0‑female, 1‑male, 2‑unknown

Transaction record table:

Column

Description

item_id

Item ID

user_id

User ID

cat_id

Category ID

cat1

Root category ID

propery

Item property (numeric strings)

buy_mount

Purchase quantity

day

Timestamp

Data Preparation

Import Data

baby = pd.read_csv("./sam_tianchi_mum_baby.csv")
trade = pd.read_csv("./sam_tianchi_mum_baby_trade_history.csv")

Data Overview

baby table: 3 columns, 953 rows, no missing values.

trade table: 7 columns, 29 971 rows, no missing values.

Property column contains only numeric strings and is removed.

buy_mount shows a mean of 2.5, standard deviation 64; values outside three standard deviations ([0, 195]) are filtered.

Data Cleaning

Check and handle missing or outlier values.

Rename auction_id to item_id in the trade table.

Drop the property column (no dictionary available).

Convert day to datetime format.

# Check for missing values (none)
# Rename column
trade.rename({"auction_id": "item_id"}, axis=1, inplace=True)
# Preserve property for later analysis
property = trade.property
trade.drop('property', axis=1, inplace=True)
# Convert dates
baby['birthday'] = pd.to_datetime(baby.birthday.astype('str'))
trade['day'] = pd.to_datetime(trade.day.astype('str'))

After cleaning, 29 942 rows remain, covering 2012/07/02–2015/02/05. The dataset includes 6 major product categories, 662 sub‑categories, 28 394 items, and 29 915 users.

Data Analysis

Overall Market Situation

Overall sales trend
Overall sales trend

Total sales from 2013/07 to 2015/02 amount to 49 973 items, showing an overall upward trend with significant fluctuations.

2015 Q1 data is incomplete, so the first quarter of that year cannot be fully reflected.

Each year’s first quarter experiences a sales dip, while the fourth quarter shows a sharp increase.

Quarterly sales pattern
Quarterly sales pattern

2013 and 2014 Q1 sales decline mainly in January–February.

May and November each year show modest sales spikes.

First‑Quarter Sales Decline Reason

Assumed to be related to the Spring Festival holiday.

2013/02/01–02/15: sales trough coincides with holiday 2013/02/09–02/15.

2014/01/26–02/04: sales trough coincides with holiday 2014/01/31–02/06.

2015 Spring Festival (02/18–02/24) falls after the dataset end date (02/05), so Q1 2015 is not analyzed.

Thus, the first‑quarter dip is attributed to the Spring Festival.

Fourth‑Quarter Sales Increase Reason

Likely driven by Double‑11 and Double‑12 promotional events.

Sales and revenue surge noticeably on Double‑11 and Double‑12 in 2013 and 2014.

User count during Double‑11 grows by 75%–80% year over year.

Repeat Purchase Rate

Repeat purchase rate
Repeat purchase rate
Monthly repeat rate
Monthly repeat rate

Monthly and category‑level repeat rates are all below 1%; the highest is 0.17% for category 38, indicating very low repeat purchase behavior.

Product Sales Situation

Category sales
Category sales

Categories 28 and 50008168 have the highest sales. Category 38, despite low total sales, shows the highest per‑user purchase amount, suggesting strong demand for a limited product range.

Baby Demographics

Age distribution
Age distribution

After joining the two tables, records from 1984 are identified as outliers and removed. Assuming analysis up to March 2015, most purchases are for children aged 0–3 years.

Gender ratio
Gender ratio

Among buying families, 47.1% have male infants and 52.9% have female infants.

Age groups are categorized as: unborn, infant (0‑12 months), toddler (1‑3 years), preschool (3‑7 years), school‑age (7+).

Hot‑selling major categories per age group:

Unborn: 50014815, 50022520, 5008168, 28

Infant: 50014815, 50022520, 5008168, 28

Toddler: 50014815, 50008168, 28

Preschool: 50008168, 28

School‑age: 50008168

Demand for category 50008168 rises with age, while demand for 50014815 declines.

Female‑infant families purchase significantly more than male‑infant families. For example, 71.05% of sales of product 50018831 (under category 50014815) are from female‑infant households.

Summary

Product Sales Findings

Mother‑baby product sales show a year‑over‑year increase, but monthly fluctuations are large.

First‑quarter sales dip each year due to the Spring Festival; fourth‑quarter peaks align with Double‑11/Double‑12 promotions.

Repeat purchase rates are extremely low, suggesting a need to improve product quality, pricing, and shopping experience.

Top‑selling major categories are 50014815, 50008168, and 28.

Category 38, though low in total sales, has the highest per‑user purchase amount; expanding sub‑categories here could boost sales.

User Portrait

Toddlers (1‑3 years) generate the highest demand; demand declines as children age.

Gender distribution of infants is roughly balanced, but female‑infant families purchase more.

Certain products are predominantly bought by female‑infant families, indicating opportunities for targeted product adjustments.

Recommendations

Reduce promotional spend and inventory a week before the Spring Festival; increase promotion and stock during Double‑11/Double‑12 pre‑heat periods.

Enhance post‑purchase follow‑up to address low repeat purchase rates.

Develop more products aimed at male infants to balance gender‑based purchasing.

Expand sub‑categories, especially under major category 38, to provide more choices and increase sales.

Limit inventory for category 12265008 to avoid overstock.

References

[1] In‑depth analysis of mother‑baby e‑commerce sales: https://zhuanlan.zhihu.com/p/129072269

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonTableauseasonalitymother-baby market
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.