Big Data 25 min read

What Drives Taobao App Users? Insights from AARRR and RFM Analyses

This article analyzes 2 million Taobao app user‑behavior records using AARRR funnel metrics and RFM segmentation, revealing daily and hourly usage patterns, conversion bottlenecks, product‑search mismatches, and offering data‑driven marketing recommendations to boost retention and sales.

21CTO
21CTO
21CTO
What Drives Taobao App Users? Insights from AARRR and RFM Analyses

1. Problem Definition and Modeling

The goal is to explain and improve Taobao app user behavior by answering five questions: (1) identify common e‑commerce metrics and stage‑wise churn rates, (2) use hypothesis testing to pinpoint loss reasons, (3) study behavior patterns at different time scales, (4) discover product preferences and marketing tactics, and (5) segment users by value for targeted strategies.

1.1 Applied Models

Use the AARRR funnel (Acquisition, Activation, Retention, Referral, Revenue) to dissect each user step after entering the app.

Apply the RFM model (Recency, Frequency, Monetary) to evaluate user value, noting that monetary data is unavailable.

AARRR funnel diagram
AARRR funnel diagram

2. Data Understanding

Data source: Alibaba Tianchi (tianchi.aliyun.com/data). The dataset covers 2017‑11‑25 00:00 to 2017‑12‑04 00:00 (9 days) with 1 billion raw rows; a representative 2 million rows are processed, containing 19 544 distinct users.

2.1 Field Description

Because the file lacks column names, they are created manually: id (user ID), item (product ID), behavior (pv, fav, cart, buy), category, and times (timestamp).

import pandas as pd
data = pd.read_csv('UserBehavior.csv')
print(data.head())

3. Data Cleaning

Rename columns for clarity.

Split the combined timestamp into separate date and time columns.

Remove records outside the 2017‑11‑25 to 2017‑12‑04 window.

set sql_safe_updates=0;
alter table user add column datetime timestamp null;
update user set datetime = from_unixtime(times);

alter table user add column date char(10) null;
update user set date = substring(datetime from 1 for 10);

alter table user add column time char(10) null;
update user set time = substring(datetime from 12 for 2);

delete from user where datetime<'2017-11-25 00:00:00' or datetime>'2017-12-04 00:00:00';

4. AARRR Funnel Analysis

4.1 Daily New Users

Rank each user's first login date, then count distinct IDs per day. The peak of 13 927 new users occurs on 11‑25, likely due to a promotion.

Daily new users chart
Daily new users chart

4.2 Retention

Retention = users who log in again N days after their first day ÷ first‑day new users. Retention stays above 75 % for the 8 days after 11‑25, reaching >98 % on 12‑02/03, indicating strong user stability.

Retention curve
Retention curve

4.3 Hourly Behavior

Users are scarce from 01:00‑06:00 (sleep time). Activity rises 06:00‑10:00, stabilizes 10:00‑18:00, peaks 18:00‑23:00, especially 22:00‑23:00 (add‑to‑cart peak).

Hourly activity chart
Hourly activity chart

4.4 Churn

Only 1 user out of 9 969 visited a single page, giving a bounce rate of 0.01 % – the app retains users effectively.

4.5 Conversion Rates

Overall purchase conversion (buy/pv) is 2.25 %, indicating high drop‑off. Conversions improve when users first favorite or add items to cart before buying.

select date, 浏览数, 付费数, 付费数/浏览数 as '转化率' from user_behavior;
Daily conversion rate
Daily conversion rate

4.6 Hypothesis Testing

Hypothesis 1: Not favoriting/adding to cart raises purchase cost → lower conversion. Data shows higher conversion for pv‑fav‑buy, pv‑cart‑buy, and pv‑fav‑cart‑buy paths, confirming the hypothesis.

Hypothesis 2: Mismatch between hot‑search and hot‑sale items reduces conversion. Only 5 of the top‑50 best‑selling items appear in the top‑50 hot‑search list (10 % match), confirming the hypothesis.

5. RFM Customer Segmentation

Recency (R) is scored 1‑5 based on days since last purchase (0‑8). Frequency (F) is scored 1‑5 based on purchase count ranges (1‑15, 16‑30, 31‑45, 46‑57, 58‑72). Monetary (M) is omitted due to missing data.

CREATE VIEW pay_B AS SELECT id, DATEDIFF('2017-12-04',MAX(date)) AS B FROM user WHERE behavior='buy' GROUP BY id;

Users are classified into four segments:

Important value customers (R > avg, F > avg)

Important development customers (R > avg, F ≤ avg)

Important retention customers (R ≤ avg, F > avg)

Important churn customers (R ≤ avg, F ≤ avg)

User segment distribution
User segment distribution

6. Conclusions and Recommendations

Key findings :

Peak user activity occurs at 21:00‑23:00, especially 22:00‑23:00.

Retention stays above 75 % for the 9‑day window; 12‑02/03 exceeds 98 %.

Conversion improves when users favorite or add items to cart before purchase.

Hot‑search and hot‑sale items have low overlap (10 % match).

Most users fall into important development or churn segments.

Actionable suggestions :

Leverage the 21:00‑23:00 window with live streams, flash sales, and coupons.

Run activation incentives (e.g., first‑order discounts) to boost new‑user conversion.

Encourage favoriting and carting through coupons or gifts to raise purchase intent.

Improve recommendation algorithms to align hot‑search items with best‑sellers and increase discounts on mismatched products.

Tailor retention tactics: VIP services for value customers, targeted coupons for development customers, proactive outreach for churn customers, and re‑engagement reminders for retention customers.

Author: 一只废物
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e‑commerceBig DataSQLuser behaviorAARRRRFM
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.