Unlock E‑Commerce Insights: Python‑Driven Taobao Data Analysis Walkthrough
This tutorial walks through a complete Python workflow for analyzing a month of Taobao mobile app data, covering data import, cleaning, feature engineering, traffic and behavior metrics, funnel conversion, RFM segmentation, and visualizations to reveal user patterns and business performance.
1. Project Background and Analysis
Project background: e‑commerce is essential; this project uses Taobao app data to analyze user behavior and discover patterns.
2. Data and Field Description
The dataset covers 2014‑11‑18 to 2014‑12‑18, mobile Taobao, 12,256,906 records, 6 columns: user_id , item_id , behavior_type (1‑click, 2‑collect, 3‑cart, 4‑pay), user_geohash , item_category , time .
3. Analysis Dimensions
Traffic metrics analysis
User behavior analysis
Funnel loss analysis
User value RFM analysis
2. Import Libraries
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
sns.set(style="darkgrid", font_scale=1.5)
mpl.rcParams["font.family"] = "SimHei"
mpl.rcParams["axes.unicode_minus"] = False
warnings.filterwarnings('ignore')3. Data Preview and Pre‑processing
df = pd.read_csv("taobao.csv", dtype=str)
df.shape
df.info()
df.sample(5)Calculate missing rate, drop user_geohash , split time into date and hour , convert types, sort by time, and reset index.
# calculate missing rate
missing_rate = df.apply(lambda x: sum(x.isnull())/len(x), axis=0)
# drop geohash column
df.drop(["user_geohash"], axis=1, inplace=True)
# split time column
df["date"] = df.time.str[0:-3]
df["hour"] = df.time.str[-2:]
# convert types
df["date"] = pd.to_datetime(df["date"])
df["time"] = pd.to_datetime(df["time"])
df["hour"] = df["hour"].astype(int)
# sort and reset index
df.sort_values(by="time", ascending=True, inplace=True)
df.reset_index(drop=True, inplace=True)4. Model Construction
Compute total page views (PV) and unique visitors (UV), daily and hourly metrics, behavior type counts, funnel analysis, and RFM segmentation.
# total PV and UV
total_pv = df["user_id"].count()
total_uv = df["user_id"].nunique()
# daily PV/UV
pv_daily = df.groupby("date")["user_id"].count()
uv_daily = df.groupby("date")["user_id"].apply(lambda x: x.nunique())
# hourly PV/UV
pv_hour = df.groupby("hour")["user_id"].count()
uv_hour = df.groupby("hour")["user_id"].apply(lambda x: x.nunique())
# behavior type counts
type_1 = df[df["behavior_type"] == "1"]["user_id"].count()
type_2 = df[df["behavior_type"] == "2"]["user_id"].count()
type_3 = df[df["behavior_type"] == "3"]["user_id"].count()
type_4 = df[df["behavior_type"] == "4"]["user_id"].count()Funnel analysis merges click, collect, and cart into a single stage to show conversion rates.
# funnel data preparation
df_count = df.groupby("behavior_type").size().reset_index(name="人数")
type_dict = {"1":"点击","2":"收藏","3":"加入购物车","4":"支付"}
df_count["环节"] = df_count["behavior_type"].map(type_dict)
a, b, c, d = df_count["人数"]
# combine collect and cart
funnel = pd.DataFrame({"环节":["点击","收藏及加入购物车","支付"],"人数":[a, b + c, d]})
funnel["总体转化率"] = funnel["人数"] / funnel["人数"].iloc[0]
funnel["单一转化率"] = [1.0, (b + c) / a, d / (b + c)]RFM analysis calculates recent purchase days, purchase frequency, assigns scores, and labels customers.
# simplified RFM calculation
recent_buy = df[df["behavior_type"] == "4"].groupby("user_id")["date"].max()
freq = df[df["behavior_type"] == "4"].groupby("user_id")["date"].count()
rfm = pd.concat([recent_buy, freq], axis=1).rename(columns={"date":"recent", "date":"freq"})
# scoring and labeling steps are omitted for brevityAll visualizations are generated with matplotlib , seaborn , or plotly (code omitted for brevity).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
