Big Data 10 min read

How ID‑Mapping Connects Data Silos Across Industries

This article explains the fundamentals of ID‑Mapping, its importance for unifying fragmented user and device data, showcases industry solutions from Alibaba, NetEase, 58.com and Meituan, and outlines technical approaches such as priority‑based rules and graph‑based computation.

Data Thinking Notes

Sep 10, 2023

How ID‑Mapping Connects Data Silos Across Industries

ID‑Mapping Overview

ID‑Mapping is a fundamental yet critical step in big‑data analysis that links multiple data sources to the same entity—such as a device, user, or enterprise—turning fragmented pieces into a complete user profile and eliminating data islands.

Typical challenges include switching accounts on the same device, different accounts across channels (e.g., WeChat mini‑program vs. app), and users logging in from various device manufacturers.

Industry Solutions

Alibaba OneID

Alibaba aggregates IDs like phone, PC cookie, IMEI/IDFA, Taobao account, Alipay account, and email. Using the OneData framework (OneModel, OneID, OneService), it unifies these identifiers into a single UID through business rules, machine learning, and graph algorithms.

NetEase ID‑Mapping

NetEase combines various account and device identifiers (e.g., musicid, oaid, phone, email, idfa, imei) and applies rule‑based and data‑mining algorithms (connected‑graph partitioning + community detection) to determine whether accounts belong to the same person.

58.com ID‑Mapping

58.com integrates data from multiple products (58 Tongcheng, Ganji, Anjuke, etc.) across logs, resumes, posts, and merchant databases. Different business lines use distinct ID tags (e.g., wuser, guser, kimei) which are linked via fields such as telep, bidua, appua, imei, and idfa to build a unified mapping.

Meituan ID‑Mapping

After merging with Dianping, Meituan aligns user identities across apps by using common login methods (phone, WeChat, Weibo) and selects the phone number as the unique identifier.

Technical Approaches

Method 1: Priority‑Based ID Mapping

Assign a unique identifier by selecting the highest‑priority ID (e.g., phone, UID, device ID). This simple method fails when users have multiple devices, channels, or when identifiers like cookies, unionid, MAC, IMEI, IMSI, AndroidID, OpenUUID, IDFA, or custom device IDs vary across logs.

Method 2: Graph‑Based Computation

Represent identifiers as nodes and their relationships as edges, then apply graph algorithms (e.g., maximum connected subgraph) to discover clusters of IDs belonging to the same entity. The workflow includes generating daily node and edge sets, merging with previous mappings, running the connectivity algorithm, and assigning a persistent UID.

The resulting ID mapping dictionary acts as a bridge that connects previously isolated data islands, enabling comprehensive user profiling and more precise analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

graph computing ID-Mapping Cross-device Tracking

Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.