E‑commerce User Behavior Analysis and KPI Modeling with Python and SQL
This study analyzes JD e‑commerce operational data from February to April 2018, employing Python and SQL to compute key metrics such as PV, UV, conversion rates, attrition, purchase frequency, time‑based behavior, funnel analysis, retention, product sales, and RFM segmentation, and provides actionable recommendations for improving user engagement and sales performance.
Project background: analysis of JD e‑commerce dataset (2018‑02‑01 to 2018‑04‑15) using MySQL and Python.
Data cleaning steps include loading CSV, filtering the date range, removing missing values, extracting date, hour and weekday, and mapping behavior types.
# import modules
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetimeKey metrics such as total page views (PV), unique visitors (UV), conversion rates, and attrition rate are computed, e.g.:
# total PV
pv = behavior[behavior['type'] == 'pv']['user_id'].count()
# total UV
uv = behavior['user_id'].nunique()
# conversion rate PV→pay
pv_pay_num = pv_pay_df['user_id'].nunique()
conversion_rate = pv_pay_num / uv * 100Time‑based analyses (daily active users, hourly traffic, weekday patterns) are visualized with seaborn bar and point plots.
Conversion funnels are built with pyecharts Funnel, showing pathways PV→pay, PV→cart→pay, PV→fav→pay, and PV→fav→cart→pay with corresponding conversion percentages.
Purchase interval analysis reveals most users complete payment within four hours for both cart and favorite pathways.
User retention is measured at 5‑, 10‑ and 15‑day intervals, yielding rates of 22.81 % and 17.44 % respectively.
Product sales analysis identifies 13 SKUs with over 1 000 purchases; a scatter plot of average pre‑purchase actions versus sales highlights fast‑moving and niche items.
RFM segmentation (using recent purchase days and frequency) classifies users into four groups: important value, important retain, important recover, and general development, with a pie chart distribution.
Based on the findings, recommendations include increasing channel promotion, optimizing search and checkout flows, targeted marketing for different RFM segments, and activities to improve retention and repeat purchase rates.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.