What Drives Taobao App Users? Insights from AARRR and RFM Analyses
This article analyzes 2 million Taobao app user‑behavior records using AARRR funnel metrics and RFM segmentation, revealing daily and hourly usage patterns, conversion bottlenecks, product‑search mismatches, and offering data‑driven marketing recommendations to boost retention and sales.
1. Problem Definition and Modeling
The goal is to explain and improve Taobao app user behavior by answering five questions: (1) identify common e‑commerce metrics and stage‑wise churn rates, (2) use hypothesis testing to pinpoint loss reasons, (3) study behavior patterns at different time scales, (4) discover product preferences and marketing tactics, and (5) segment users by value for targeted strategies.
1.1 Applied Models
Use the AARRR funnel (Acquisition, Activation, Retention, Referral, Revenue) to dissect each user step after entering the app.
Apply the RFM model (Recency, Frequency, Monetary) to evaluate user value, noting that monetary data is unavailable.
2. Data Understanding
Data source: Alibaba Tianchi (tianchi.aliyun.com/data). The dataset covers 2017‑11‑25 00:00 to 2017‑12‑04 00:00 (9 days) with 1 billion raw rows; a representative 2 million rows are processed, containing 19 544 distinct users.
2.1 Field Description
Because the file lacks column names, they are created manually: id (user ID), item (product ID), behavior (pv, fav, cart, buy), category, and times (timestamp).
import pandas as pd
data = pd.read_csv('UserBehavior.csv')
print(data.head())3. Data Cleaning
Rename columns for clarity.
Split the combined timestamp into separate date and time columns.
Remove records outside the 2017‑11‑25 to 2017‑12‑04 window.
set sql_safe_updates=0;
alter table user add column datetime timestamp null;
update user set datetime = from_unixtime(times);
alter table user add column date char(10) null;
update user set date = substring(datetime from 1 for 10);
alter table user add column time char(10) null;
update user set time = substring(datetime from 12 for 2);
delete from user where datetime<'2017-11-25 00:00:00' or datetime>'2017-12-04 00:00:00';4. AARRR Funnel Analysis
4.1 Daily New Users
Rank each user's first login date, then count distinct IDs per day. The peak of 13 927 new users occurs on 11‑25, likely due to a promotion.
4.2 Retention
Retention = users who log in again N days after their first day ÷ first‑day new users. Retention stays above 75 % for the 8 days after 11‑25, reaching >98 % on 12‑02/03, indicating strong user stability.
4.3 Hourly Behavior
Users are scarce from 01:00‑06:00 (sleep time). Activity rises 06:00‑10:00, stabilizes 10:00‑18:00, peaks 18:00‑23:00, especially 22:00‑23:00 (add‑to‑cart peak).
4.4 Churn
Only 1 user out of 9 969 visited a single page, giving a bounce rate of 0.01 % – the app retains users effectively.
4.5 Conversion Rates
Overall purchase conversion (buy/pv) is 2.25 %, indicating high drop‑off. Conversions improve when users first favorite or add items to cart before buying.
select date, 浏览数, 付费数, 付费数/浏览数 as '转化率' from user_behavior;4.6 Hypothesis Testing
Hypothesis 1: Not favoriting/adding to cart raises purchase cost → lower conversion. Data shows higher conversion for pv‑fav‑buy, pv‑cart‑buy, and pv‑fav‑cart‑buy paths, confirming the hypothesis.
Hypothesis 2: Mismatch between hot‑search and hot‑sale items reduces conversion. Only 5 of the top‑50 best‑selling items appear in the top‑50 hot‑search list (10 % match), confirming the hypothesis.
5. RFM Customer Segmentation
Recency (R) is scored 1‑5 based on days since last purchase (0‑8). Frequency (F) is scored 1‑5 based on purchase count ranges (1‑15, 16‑30, 31‑45, 46‑57, 58‑72). Monetary (M) is omitted due to missing data.
CREATE VIEW pay_B AS SELECT id, DATEDIFF('2017-12-04',MAX(date)) AS B FROM user WHERE behavior='buy' GROUP BY id;Users are classified into four segments:
Important value customers (R > avg, F > avg)
Important development customers (R > avg, F ≤ avg)
Important retention customers (R ≤ avg, F > avg)
Important churn customers (R ≤ avg, F ≤ avg)
6. Conclusions and Recommendations
Key findings :
Peak user activity occurs at 21:00‑23:00, especially 22:00‑23:00.
Retention stays above 75 % for the 9‑day window; 12‑02/03 exceeds 98 %.
Conversion improves when users favorite or add items to cart before purchase.
Hot‑search and hot‑sale items have low overlap (10 % match).
Most users fall into important development or churn segments.
Actionable suggestions :
Leverage the 21:00‑23:00 window with live streams, flash sales, and coupons.
Run activation incentives (e.g., first‑order discounts) to boost new‑user conversion.
Encourage favoriting and carting through coupons or gifts to raise purchase intent.
Improve recommendation algorithms to align hot‑search items with best‑sellers and increase discounts on mismatched products.
Tailor retention tactics: VIP services for value customers, targeted coupons for development customers, proactive outreach for churn customers, and re‑engagement reminders for retention customers.
Author: 一只废物
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
