Big Data 15 min read

Unlock E‑Commerce Insights: Python‑Driven Taobao Data Analysis Walkthrough

This tutorial walks through a complete Python workflow for analyzing a month of Taobao mobile app data, covering data import, cleaning, feature engineering, traffic and behavior metrics, funnel conversion, RFM segmentation, and visualizations to reveal user patterns and business performance.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Unlock E‑Commerce Insights: Python‑Driven Taobao Data Analysis Walkthrough

1. Project Background and Analysis

Project background: e‑commerce is essential; this project uses Taobao app data to analyze user behavior and discover patterns.

2. Data and Field Description

The dataset covers 2014‑11‑18 to 2014‑12‑18, mobile Taobao, 12,256,906 records, 6 columns: user_id , item_id , behavior_type (1‑click, 2‑collect, 3‑cart, 4‑pay), user_geohash , item_category , time .

3. Analysis Dimensions

Traffic metrics analysis

User behavior analysis

Funnel loss analysis

User value RFM analysis

2. Import Libraries

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
sns.set(style="darkgrid", font_scale=1.5)
mpl.rcParams["font.family"] = "SimHei"
mpl.rcParams["axes.unicode_minus"] = False
warnings.filterwarnings('ignore')

3. Data Preview and Pre‑processing

df = pd.read_csv("taobao.csv", dtype=str)
df.shape
df.info()
df.sample(5)

Calculate missing rate, drop user_geohash , split time into date and hour , convert types, sort by time, and reset index.

# calculate missing rate
missing_rate = df.apply(lambda x: sum(x.isnull())/len(x), axis=0)
# drop geohash column
df.drop(["user_geohash"], axis=1, inplace=True)
# split time column
df["date"] = df.time.str[0:-3]
df["hour"] = df.time.str[-2:]
# convert types
df["date"] = pd.to_datetime(df["date"])
df["time"] = pd.to_datetime(df["time"])
df["hour"] = df["hour"].astype(int)
# sort and reset index
df.sort_values(by="time", ascending=True, inplace=True)
df.reset_index(drop=True, inplace=True)

4. Model Construction

Compute total page views (PV) and unique visitors (UV), daily and hourly metrics, behavior type counts, funnel analysis, and RFM segmentation.

# total PV and UV
total_pv = df["user_id"].count()
total_uv = df["user_id"].nunique()
# daily PV/UV
pv_daily = df.groupby("date")["user_id"].count()
uv_daily = df.groupby("date")["user_id"].apply(lambda x: x.nunique())
# hourly PV/UV
pv_hour = df.groupby("hour")["user_id"].count()
uv_hour = df.groupby("hour")["user_id"].apply(lambda x: x.nunique())
# behavior type counts
type_1 = df[df["behavior_type"] == "1"]["user_id"].count()
type_2 = df[df["behavior_type"] == "2"]["user_id"].count()
type_3 = df[df["behavior_type"] == "3"]["user_id"].count()
type_4 = df[df["behavior_type"] == "4"]["user_id"].count()

Funnel analysis merges click, collect, and cart into a single stage to show conversion rates.

# funnel data preparation
df_count = df.groupby("behavior_type").size().reset_index(name="人数")
type_dict = {"1":"点击","2":"收藏","3":"加入购物车","4":"支付"}
df_count["环节"] = df_count["behavior_type"].map(type_dict)
a, b, c, d = df_count["人数"]
# combine collect and cart
funnel = pd.DataFrame({"环节":["点击","收藏及加入购物车","支付"],"人数":[a, b + c, d]})
funnel["总体转化率"] = funnel["人数"] / funnel["人数"].iloc[0]
funnel["单一转化率"] = [1.0, (b + c) / a, d / (b + c)]

RFM analysis calculates recent purchase days, purchase frequency, assigns scores, and labels customers.

# simplified RFM calculation
recent_buy = df[df["behavior_type"] == "4"].groupby("user_id")["date"].max()
freq = df[df["behavior_type"] == "4"].groupby("user_id")["date"].count()
rfm = pd.concat([recent_buy, freq], axis=1).rename(columns={"date":"recent", "date":"freq"})
# scoring and labeling steps are omitted for brevity

All visualizations are generated with matplotlib , seaborn , or plotly (code omitted for brevity).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e‑commercePythonTaobaodata analysisvisualizationpandasRFM
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.