How ID‑Mapping Connects Data Silos Across Industries
This article explains the fundamentals of ID‑Mapping, its importance for unifying fragmented user and device data, showcases industry solutions from Alibaba, NetEase, 58.com and Meituan, and outlines technical approaches such as priority‑based rules and graph‑based computation.
ID‑Mapping Overview
ID‑Mapping is a fundamental yet critical step in big‑data analysis that links multiple data sources to the same entity—such as a device, user, or enterprise—turning fragmented pieces into a complete user profile and eliminating data islands.
Typical challenges include switching accounts on the same device, different accounts across channels (e.g., WeChat mini‑program vs. app), and users logging in from various device manufacturers.
Industry Solutions
Alibaba OneID
Alibaba aggregates IDs like phone, PC cookie, IMEI/IDFA, Taobao account, Alipay account, and email. Using the OneData framework (OneModel, OneID, OneService), it unifies these identifiers into a single UID through business rules, machine learning, and graph algorithms.
NetEase ID‑Mapping
NetEase combines various account and device identifiers (e.g., musicid, oaid, phone, email, idfa, imei) and applies rule‑based and data‑mining algorithms (connected‑graph partitioning + community detection) to determine whether accounts belong to the same person.
58.com ID‑Mapping
58.com integrates data from multiple products (58 Tongcheng, Ganji, Anjuke, etc.) across logs, resumes, posts, and merchant databases. Different business lines use distinct ID tags (e.g., wuser, guser, kimei) which are linked via fields such as telep, bidua, appua, imei, and idfa to build a unified mapping.
Meituan ID‑Mapping
After merging with Dianping, Meituan aligns user identities across apps by using common login methods (phone, WeChat, Weibo) and selects the phone number as the unique identifier.
Technical Approaches
Method 1: Priority‑Based ID Mapping
Assign a unique identifier by selecting the highest‑priority ID (e.g., phone, UID, device ID). This simple method fails when users have multiple devices, channels, or when identifiers like cookies, unionid, MAC, IMEI, IMSI, AndroidID, OpenUUID, IDFA, or custom device IDs vary across logs.
Method 2: Graph‑Based Computation
Represent identifiers as nodes and their relationships as edges, then apply graph algorithms (e.g., maximum connected subgraph) to discover clusters of IDs belonging to the same entity. The workflow includes generating daily node and edge sets, merging with previous mappings, running the connectivity algorithm, and assigning a persistent UID.
The resulting ID mapping dictionary acts as a bridge that connects previously isolated data islands, enabling comprehensive user profiling and more precise analytics.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.