Big Data 16 min read

User Profiling: Concepts, Practices, and Data‑Driven E‑Commerce Case Study

This article introduces the fundamentals of user profiling, explains tag types and their business value, and demonstrates a data‑driven e‑commerce case study that analyzes gender, age, region, marital status, education, profession, product preferences, purchase timing, and price sensitivity to guide targeted promotion strategies.

DataFunTalk
DataFunTalk
DataFunTalk
User Profiling: Concepts, Practices, and Data‑Driven E‑Commerce Case Study

Author: Mu Xiaoxiong, Huazhong Agricultural University Source: Datawhale

The article begins with a brief introduction to user profiling, emphasizing that profiling abstracts concrete user information into tags to create a concrete user image for personalized services.

1. User Profiling Basics

Profiling core is to label users by converting each piece of concrete information into tags, enabling targeted services.

Example: a matchmaking scenario where a female’s ideal male partner is described using tags such as age, height, income, location, education, etc.

2. Tag Types

Statistical tags : basic attributes like name, gender, age, city, activity duration, derived from registration or transaction data.

Rule‑based tags : created collaboratively by operations and data teams based on business rules and user behavior.

Learning‑derived tags : generated by machine‑learning models, e.g., inferring gender from purchase of feminine products.

3. Value of User Profiling

Large‑scale businesses invest heavily in profiling to collect and analyze data across business lines, enabling precise services and diversified operation strategies.

Applications

User acquisition via DMP advertising targeting similar‑tag users.

Cold‑start for new users by inferring attributes from regional tag distributions.

Personalized or precise services based on rich profile analysis.

Multi‑scenario identification (e.g., linking accounts across phone numbers).

Reactivating dormant users by analyzing sensitivity and designing activation strategies.

4. Practical Project: E‑Commerce Promotion Case

Scenario: A data analyst is asked to help an e‑commerce platform improve declining orders for a home‑appliance category by designing a coupon promotion.

The analysis proceeds in six steps, extracting data from a masked order dataset (2020‑08‑12 to 2020‑08‑19) and visualizing various dimensions.

Step 1 – Data Extraction

data.head()

Step 2 – Gender & Age Distribution

labels = ['男','女']
values = [male_user, female_user]
trace = [go.Pie(labels=labels, values=values)]
layout = go.Layout(title=dict(text='用户的性别分布',x=0.5))
fig = go.Figure(data=trace, layout=layout)
fig
x = ['18岁以下','18~25岁','25~35岁','35~45岁','45~55岁','55岁以上']
y = user_age_df['user_age_count']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户年龄分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig

Findings: Slight male dominance; age concentrated 25‑35; low activity among <18 and >45.

Step 3 – Regional Distribution

y = user_region_df['province_name'][::-1]
x = user_region_df['region_count'][::-1]
trace = go.Bar(x=x, y=y, text=x, textposition='outside', orientation='h')
layout = go.Layout(title=dict(text='用户的地域分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig
y = user_city_df['ulp_addr_city'][::-1]
x = user_city_df['city_count'][::-1]
trace = go.Bar(x=x, y=y, text=x, textposition='outside', orientation='h')
layout = go.Layout(title=dict(text='用户的城市分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig

Users are mainly in first‑tier and new‑first‑tier cities, aligning with the age distribution.

Step 4 – Marital & Child Status

labels = ['已婚','未婚']
values = [married_user, unmarried_user]
trace = [go.Pie(labels=labels, values=values)]
layout = go.Layout(title=dict(text='用户的性别分布',x=0.5))
fig = go.Figure(data=trace, layout=layout)
fig
labels = ['高','较高','较低','低']
values = [very_high, high, low, very_low]
trace = [go.Pie(labels=labels, values=values)]
layout = go.Layout(title=dict(text='用户的小孩情况',x=0.5))
fig = go.Figure(data=trace, layout=layout)
fig

~70% are married; >60% likely have children.

Step 5 – Education & Occupation

y = user_edu_df['edu']
x = ['初中及以下','高中(中专)','大学(专科及本科)','研究生(硕士及以上)']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户的学历分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig
x = ['金融从业者','医务人员','公务员/事业单位','白领/一般职员','工人/服务业人员','教师','互联网从业人员','学生']
y = user_profession_df['profession']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户的学历分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig

Typical user: male, 28‑30, married with children, lives in a first‑tier city, bachelor’s degree, works in internet industry with stable income.

Step 6 – Purchase Behavior

y = user_order_cate_df['item_third_cate_name'][::-1]
x = user_order_cate_df['cate_count'][::-1]
trace = go.Bar(x=x, y=y, text=x, textposition='outside', orientation='h')
layout = go.Layout(title=dict(text='用户购买商品分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig

Top product: electric fans (seasonal). Recommendation: promote water purifiers, humidifiers for early autumn.

x = ['星期一','星期二','星期三','星期四','星期五','星期六','星期日']
y = user_order_week_df_2['week_count']
trace = go.Scatter(x=x, y=y, mode='lines', line=dict(width=2))
layout = go.Layout(title=dict(text='用户购买的日期分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig
x = [str(i) for i in range(0,24)]
y = user_order_hms_df['hms_count']
trace = go.Scatter(x=x, y=y, mode='lines', line=dict(width=2))
layout = go.Layout(title=dict(text='用户购买的时间分布',x=0.5), xaxis=dict(tickmode='linear'))
fig = go.Figure(data=trace,layout=layout)
fig

Peak order times: Tuesday & Saturday, 10‑11 am and 8‑10 pm.

Step 7 – Price Sensitivity

x = ['不敏感','轻度敏感','中度敏感','高度敏感','极度敏感']
y = user_order_sens_promotion_df['sens_promotion_count']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户价格敏感度分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig
x = ['不敏感','轻度敏感','中度敏感','高度敏感','极度敏感']
y = user_order_sens_comment_df['sens_comment_count']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户频率敏感度分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig

Users are price‑sensitive and highly sensitive to reviews; thus, promote well‑reviewed products.

Recommendations for Promotion

Use neutral copy; highlight family‑quality and safety of home‑appliance products.

Focus on end‑of‑summer/early‑autumn items such as water purifiers, humidifiers, and drinking‑water machines.

Schedule ads on Tuesdays and Saturdays, especially around 10 am and 9‑10 pm.

Select products with strong positive reviews to match user sensitivity.

Finally, the author thanks the audience and invites readers to join the DataFunTalk community for further big‑data and AI discussions.

e-commercebig dataPythondata analysisuser profilingmarketingvisualization
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.