Big Data 13 min read

Comprehensive Overview of Tracking System, Data Warehouse Construction, and Attribution in an E‑commerce Platform

The article presents a comprehensive end‑to‑end traffic data architecture for an e‑commerce platform, detailing hybrid frontend/backend tracking with SPM/SCM/action standards, data‑warehouse construction of fact and dimension tables, UUID i_code unification, real‑time attribution methods, and future automation of warehouse and model layers.

NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Comprehensive Overview of Tracking System, Data Warehouse Construction, and Attribution in an E‑commerce Platform

In the era of traffic‑driven e‑commerce, building a reliable traffic data pipeline is more challenging than business data because the source data are semi‑structured, noisy, and lack predefined analytical dimensions. This article outlines the end‑to‑end traffic data architecture, focusing on tracking (埋点) systems, data‑warehouse construction, UUID/attribution mechanisms, and future directions.

1. Tracking System Construction (埋点体系建设)

The tracking system is the core of the traffic data warehouse. The quality of upstream tracking directly determines data reliability for downstream analytics.

1.1 Tracking Classification

Technically, tracking is divided into frontend tracking and backend tracking :

Frontend tracking integrates SDKs on the client side and includes three types: Code‑based tracking Visual (no‑code) tracking Zero‑code (XPath‑based) tracking Advantages: flexible, easy to capture user interactions such as clicks. Disadvantages: depends on client environment, data may be delayed or lost due to network constraints. Category Code‑based Visual Concept Requires developers to embed tracking code. Product/operations configure tracking points on a management platform; SDK detects UI elements automatically. Pros Highly customizable, precise, rich data. Low implementation cost. Cons High implementation cost. Limited to interactive UI, narrow coverage.

Backend tracking collects server‑side logs (e.g., login logs). Advantages: strong real‑time transmission, low data loss. Disadvantages: limited to backend events, cannot capture UI behavior, more crawler noise.

In the strict‑selection (严选) e‑commerce scenario, a hybrid approach is used: code‑based tracking combined with XPath‑based zero‑code tracking. Tracking entities are defined as SPM+SCM+ACTION , where SPM denotes page position, SCM carries business parameters (e.g., material, AB test group), and ACTION represents a series of user actions.

SPM Semantic Standardization

English Name

Chinese Name

Page

Associated Events

indexsign

首页签到入口

首页(index)

click_index_signin

show_index_signin

kingkong

金刚区

首页(index)

click_index_kingkong

show_index_kingkong

banner

首焦

首页(index)

click_index_banner

show_index_banner

searchrank

热搜榜

搜索关键词列表页(searchkw)

click_searchkw_searchrank

show_searchkw_searchrank

SCM Standardization

Backend business parameters are unified as JSON. Example:

{
  "extra": {
    "k1": v1,
    "k2": v2,
    "k3": v3
  }
}

Action Standardization

Event

Description

click

User click behavior

add

Product add‑to‑cart behavior

collect

Product collection/favorite

view

Page view (one record per load)

show

Module exposure on a page

special

Custom events, e.g., risk control actions

1.2 Development Process & Quality Assurance

The typical development workflow includes requirement analysis, SDK integration, test verification, and release. Quality assurance practices are detailed in the internal document “严选埋点质量保障体系建设”.

2. Data Warehouse Construction (数仓建设)

2.1 Business Architecture Diagram

2.2 Data Warehouse Architecture Diagram

2.3 Fact Table Construction

Fact tables are built by selecting business processes → granularity → dimensions → facts → redundant dimensions. For traffic facts, the process is the user's behavior sequence, embedded within tracking events.

2.4 Dimension Table Construction

Dimension tables source data from business tables and detailed tracking logs. Business tables are typically snapshot tables; user/device attributes are derived from fact tables (e.g., first/last visit timestamps).

2.5 DWS Table Construction

DWS (Data Warehouse Service) tables provide common granularity and metrics. Separate UUID‑level DWS models reduce computation cost and improve extensibility while maintaining metric consistency across finer‑grained models.

3. UUID and Attribution Construction (uuid和归因建设)

3.1 UUID Construction

To resolve many‑to‑many relationships between accounts and devices, an i_code scheme is introduced, treating the unified identifier as a UUID.

3.2 Attribution Construction

Attribution is divided into three categories: user reach attribution, external channel attribution, and internal guide attribution. The article focuses on internal guide attribution, which tracks user paths from entry pages to conversion.

Two attribution methods are applied:

Last‑click single‑point attribution (used for entry pages to avoid over‑attribution).

Last‑click multi‑point attribution (captures the final conversion touchpoint).

The attribution solution evolved through three stages:

Initial stage: Manual definition of page hierarchies, high maintenance cost.

Mid stage: Solved initial issues but relied on offline data.

Current stage: Supports real‑time data; each tracking point transmits the previous ten steps, eliminating the need for external linking.

4. Data Applications (数据应用)

Traffic data is used in various tools and business scenarios:

DSP (advertising platform)

DMP (user tags, profiling)

A/B testing platforms

User bus services

BI reports

Data products such as behavior analysis systems and marketing operation platforms

Business use cases: ad delivery, user acquisition, intelligent marketing, traffic competition, search recommendation, etc.

5. Future Outlook (未来展望)

The traffic data warehouse is mature; future work focuses on three directions:

Automated warehouse construction (ODS layer now auto‑generated; marketplace layer automation in progress).

Enriching DWS models and reducing duplicated metrics by progressively sinking logic into DWS.

Upgrading marketplace models with OLAP engines such as Doris, leveraging materialized views to reduce model count.

Article authored by the严选 technical team.

e-commerceanalyticsBig DataData Warehousedata trackingevent attribution
NetEase Yanxuan Technology Product Team
Written by

NetEase Yanxuan Technology Product Team

The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.