Comprehensive Overview of Tracking System, Data Warehouse Construction, and Attribution in an E‑commerce Platform
The article presents a comprehensive end‑to‑end traffic data architecture for an e‑commerce platform, detailing hybrid frontend/backend tracking with SPM/SCM/action standards, data‑warehouse construction of fact and dimension tables, UUID i_code unification, real‑time attribution methods, and future automation of warehouse and model layers.
In the era of traffic‑driven e‑commerce, building a reliable traffic data pipeline is more challenging than business data because the source data are semi‑structured, noisy, and lack predefined analytical dimensions. This article outlines the end‑to‑end traffic data architecture, focusing on tracking (埋点) systems, data‑warehouse construction, UUID/attribution mechanisms, and future directions.
1. Tracking System Construction (埋点体系建设)
The tracking system is the core of the traffic data warehouse. The quality of upstream tracking directly determines data reliability for downstream analytics.
1.1 Tracking Classification
Technically, tracking is divided into frontend tracking and backend tracking :
Frontend tracking integrates SDKs on the client side and includes three types: Code‑based tracking Visual (no‑code) tracking Zero‑code (XPath‑based) tracking Advantages: flexible, easy to capture user interactions such as clicks. Disadvantages: depends on client environment, data may be delayed or lost due to network constraints. Category Code‑based Visual Concept Requires developers to embed tracking code. Product/operations configure tracking points on a management platform; SDK detects UI elements automatically. Pros Highly customizable, precise, rich data. Low implementation cost. Cons High implementation cost. Limited to interactive UI, narrow coverage.
Backend tracking collects server‑side logs (e.g., login logs). Advantages: strong real‑time transmission, low data loss. Disadvantages: limited to backend events, cannot capture UI behavior, more crawler noise.
In the strict‑selection (严选) e‑commerce scenario, a hybrid approach is used: code‑based tracking combined with XPath‑based zero‑code tracking. Tracking entities are defined as SPM+SCM+ACTION , where SPM denotes page position, SCM carries business parameters (e.g., material, AB test group), and ACTION represents a series of user actions.
SPM Semantic Standardization
English Name
Chinese Name
Page
Associated Events
indexsign
首页签到入口
首页(index)
click_index_signin
show_index_signin
kingkong
金刚区
首页(index)
click_index_kingkong
show_index_kingkong
banner
首焦
首页(index)
click_index_banner
show_index_banner
searchrank
热搜榜
搜索关键词列表页(searchkw)
click_searchkw_searchrank
show_searchkw_searchrank
SCM Standardization
Backend business parameters are unified as JSON. Example:
{
"extra": {
"k1": v1,
"k2": v2,
"k3": v3
}
}Action Standardization
Event
Description
click
User click behavior
add
Product add‑to‑cart behavior
collect
Product collection/favorite
view
Page view (one record per load)
show
Module exposure on a page
special
Custom events, e.g., risk control actions
1.2 Development Process & Quality Assurance
The typical development workflow includes requirement analysis, SDK integration, test verification, and release. Quality assurance practices are detailed in the internal document “严选埋点质量保障体系建设”.
2. Data Warehouse Construction (数仓建设)
2.1 Business Architecture Diagram
2.2 Data Warehouse Architecture Diagram
2.3 Fact Table Construction
Fact tables are built by selecting business processes → granularity → dimensions → facts → redundant dimensions. For traffic facts, the process is the user's behavior sequence, embedded within tracking events.
2.4 Dimension Table Construction
Dimension tables source data from business tables and detailed tracking logs. Business tables are typically snapshot tables; user/device attributes are derived from fact tables (e.g., first/last visit timestamps).
2.5 DWS Table Construction
DWS (Data Warehouse Service) tables provide common granularity and metrics. Separate UUID‑level DWS models reduce computation cost and improve extensibility while maintaining metric consistency across finer‑grained models.
3. UUID and Attribution Construction (uuid和归因建设)
3.1 UUID Construction
To resolve many‑to‑many relationships between accounts and devices, an i_code scheme is introduced, treating the unified identifier as a UUID.
3.2 Attribution Construction
Attribution is divided into three categories: user reach attribution, external channel attribution, and internal guide attribution. The article focuses on internal guide attribution, which tracks user paths from entry pages to conversion.
Two attribution methods are applied:
Last‑click single‑point attribution (used for entry pages to avoid over‑attribution).
Last‑click multi‑point attribution (captures the final conversion touchpoint).
The attribution solution evolved through three stages:
Initial stage: Manual definition of page hierarchies, high maintenance cost.
Mid stage: Solved initial issues but relied on offline data.
Current stage: Supports real‑time data; each tracking point transmits the previous ten steps, eliminating the need for external linking.
4. Data Applications (数据应用)
Traffic data is used in various tools and business scenarios:
DSP (advertising platform)
DMP (user tags, profiling)
A/B testing platforms
User bus services
BI reports
Data products such as behavior analysis systems and marketing operation platforms
Business use cases: ad delivery, user acquisition, intelligent marketing, traffic competition, search recommendation, etc.
5. Future Outlook (未来展望)
The traffic data warehouse is mature; future work focuses on three directions:
Automated warehouse construction (ODS layer now auto‑generated; marketplace layer automation in progress).
Enriching DWS models and reducing duplicated metrics by progressively sinking logic into DWS.
Upgrading marketplace models with OLAP engines such as Doris, leveraging materialized views to reduce model count.
Article authored by the严选 technical team.
NetEase Yanxuan Technology Product Team
The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.