Zhuanzhuan User Profile Platform: Architecture, Tag Construction, Storage, and User Segmentation Practices
This article details Zhuanzhuan's user profile platform, covering its business-driven motivation, tag taxonomy, system architecture, data pipelines using Hive, ClickHouse and Spark, storage design, per‑user insight, segmentation techniques, ID‑mapping, and future plans for real‑time tagging.
Zhuanzhuan, a leading second‑hand e‑commerce platform, has experienced rapid user growth, prompting a shift from coarse‑grained to data‑driven refined operations; the user profile platform was built to support this transition.
A user profile is a tag‑based abstract model that captures user attributes, preferences, habits and behaviors, enabling concise description and quantitative analysis of user groups for business decisions.
Typical applications include user insight for commercial inspiration, enriching data dimensions for deeper analysis, and enabling fine‑grained operations such as targeted messaging via SMS, push, or email.
The platform consists of six major modules: tag management, crowd calculation, user‑portrait analysis, operation, insight, and permission management, forming the backbone for tag construction and usage.
Tag construction follows two principles: it must originate from concrete business scenarios (e.g., acquisition, retention, experience improvement) and guide product design by providing clear, actionable user characteristics.
Tags are divided into generic tags (basic attributes like age, gender, city) and business tags (e.g., cumulative orders, category history). Each tag follows a four‑level hierarchy: level‑1_label‑2_label‑3_label‑4_value, exemplified by "BusinessTag_B2C_ActiveTime_12‑20".
Production rules are highly flexible, using configurable SQL templates that can be visualized and edited; the system supports billions of users, daily incremental updates, and automatic dependency detection before computation.
Data sources comprise business data (orders) and behavioral data (item exposures). Prior to tag creation, raw data are cleaned and prepared.
Tag templates include attribute‑filter templates for selecting user subsets (e.g., males with ≥5 product views) and file‑upload templates that ingest pre‑defined user lists into Hive partitions.
User attribute set operations support intersection, union, and difference with nested expressions; a custom UDF parses logical trees, performs short‑circuit evaluation, and determines membership efficiently.
Tag creation supports four methods: SQL‑template‑based enumeration, grouping, direct ID upload, and custom SQL for advanced scenarios.
All tag data are stored in Hive as daily ORC‑file partitions; a tag model table links user IDs to tag names and values, while per‑user aggregated tags are cached in HBase KV for millisecond‑level retrieval, and behavior paths are synchronized to ClickHouse for OLAP queries.
For user insight, per‑user tag portraits are materialized in HBase using a reversed‑hash rowkey to avoid hotspotting, and behavior sequences are queried in ClickHouse, enabling second‑level response times.
User segmentation combines multiple tag and behavior sets; an example Hive‑SQL snippet is shown below:
select xid
from (
SELECT xid, '1670502093000' as tag_ex
from table
where label_name = xxx and label_value = xxx
union all
SELECT xid, '1670502131570' as tag_ex
from table
where label_name = xxx and label_value = xxx
) group by xid
having group_c(collect_set(tag_ex), '((1664348724964)&(1664348724974))');ClickHouse bitmap functions accelerate segmentation; a simplified table definition and query are provided:
CREATE TABLE user_labels (
label_name String,
label_value String,
userIds AggregateFunction(groupBitmap, UInt64)
) ENGINE = MergeTree
PARTITION BY label_name, label_value
ORDER BY label_name, label_value; bitmapAnd (
(select groupBitmapMergeState(mapping_id) from table where dt='2022-12-01' and label_name='XXX' and label_value='XXX' group by label_name, label_value),
(select groupBitmapState(abs(mapping_id)) from table where dt='2022-12-01' and label_name='XXX' and label_value='XXX')
);ID‑mapping unifies disparate identifiers across the Zhuanzhuan app, Find‑Nice‑Phone app, and mini‑programs into a single OneID model, enabling comprehensive cross‑device user analysis.
Future work includes supporting real‑time tags to meet stricter latency requirements and tighter integration with the intelligent operation planning platform.
In summary, the article shares practical experiences and lessons learned from building a scalable, flexible user tagging and profiling system that supports billions of users, daily updates, and advanced segmentation for refined business operations.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.