Big Data 15 min read

Overview of the Traffic Domain and Its Data Governance Architecture

This document presents a comprehensive overview of the traffic domain in a data warehouse, covering its concepts, objectives, guiding principles, core and extension models, data quality, monitoring, scheduling, and operational practices to achieve a complete, accurate, efficient, low‑cost, and high‑value traffic data system while addressing massive data volume, consistency, and SLA challenges.

政采云技术

Jun 21, 2022

Overview of the Traffic Domain and Its Data Governance Architecture

1. Traffic Domain Overview

1.1 Concept

In data‑warehouse construction, domains such as transaction, marketing, and customer are defined; the traffic domain focuses on processing and analyzing event‑tracking data to support active‑user, funnel, and attribution analyses, providing strong data support for operations, growth, commercialization, and senior decision‑making.

1.2 Purpose of Building the Traffic Domain

The goal is to establish a comprehensive, accurate, efficient, low‑cost, high‑value traffic data system that integrates all‑domain traffic basics, offers a "god‑view" of business, and ensures a complete data‑quality lifecycle from pre‑ to post‑processing.

1.3 Guiding Principles

1.3.1 Modeling Philosophy

High cohesion, low coupling is the main architectural target, achieved through layered design, open‑source frameworks, functional subsystem division, and package structures.

Data models are designed by business and access characteristics: related data with the same granularity are grouped, and frequently co‑accessed data are stored together.

2 Core Model and Extension Model Separation

Core models contain fields for common business needs; extension models add fields for personalized or low‑frequency use, preserving core simplicity and maintainability.

3 Common Processing Logic Consolidation

Shared low‑level logic should be encapsulated in the data‑scheduling layer, avoiding exposure to application code and duplication.

4 Cost‑Performance Balance

Appropriate data redundancy can improve query and refresh performance, but excessive redundancy should be avoided.

5 Data Rollback

Processing logic must be immutable so that repeated runs at different times produce consistent results.

6 Consistency

Fields with the same meaning must use identical naming across tables, following the defined standards.

2 Traffic Domain Basic Data Architecture

2.1 Data‑Warehouse Architecture

2.2 Technical Architecture

3 Full‑Link Governance of the Traffic Domain

Current Pain Points

Inconsistent metrics across reports.

Different metric definitions for different stakeholders.

Delayed data delivery.

Uncommunicated tracking point changes after version upgrades.

Data developers repeatedly pulling raw logs.

3.1 Traffic Data Characteristics

Extremely large volume (hundreds of billions to petabytes daily).

Cross‑domain, complex business scenarios requiring knowledge of other domains (user, member, etc.).

3.2 Problem Exposure

Non‑standard tracking points.

Non‑standard data development.

Inconsistent metric definitions and untimely notifications.

Duplicate metric development causing consistency issues.

Chimney‑style development leading to cost waste.

Lack of effective metric fluctuation monitoring.

Unmet SLA for data delivery.

Dimension‑table maintenance problems.

3.3 Solutions

3.3.1 Refine Development Standards and Optimize Data Models

Improve existing word‑root dictionaries, domain/subject division, model review, development standards, and ETL cleaning guidelines.

Metadata‑driven optimization of common‑layer models enhances completeness, reusability, and规范度, boosting development efficiency and quality.

3.3.2 Tracking Point Standards

Tracking points embed code in pages or buttons to collect user‑behavior data, enabling page and action statistics. A tracking‑management platform prevents chaotic tracking.

3.3.3 Metric Management Platform

The platform fundamentally resolves most metric‑consistency issues.

3.3.4 Data Quality Assurance

Integrity: missing entries or attributes.

Consistency: naming, coding, meaning, lifecycle mismatches across sources.

Accuracy: reliability of data.

Uniqueness: detection of duplicate or redundant data.

Correlation: missing or incorrect relationships (foreign keys, indexes, etc.).

Authenticity: data must truthfully reflect real entities.

Timeliness: data availability when needed.

Logical checks, outlier detection, fluctuation audits, weighted rule prioritization.

The ultimate goal is a configurable, page‑driven system.

3.3.5 SLA Guarantees

Model Layer

Design Layer

Strict adherence to model architecture, design principles, and layering ensures stable pipelines.

Model Iteration

Iterative models must be trial‑run before submission, and downstream impact must be verified, especially when adding new fields.

Model Optimization

Optimization focuses on task tuning and horizontal/vertical splitting of fact and dimension tables to reduce execution time.

Model Priority

Assigning task priority helps identify core warehouse tasks and informs alert mechanisms.

Scheduling Layer

Scheduling System Issues

Downstream tasks not triggered promptly after upstream completion.

Tasks remain in "running" state despite completion.

Task Dependency Configuration

Missing upstream dependencies cause incorrect data and massive re‑runs.

Circular dependencies block execution.

Big‑Data Operations Layer

Resources

Timely evaluation of storage and compute resources prevents night‑time failures due to insufficient capacity.

Component Failures

Common issues include Hive‑on‑Spark connectivity problems and other component outages that halt batch jobs.

Monitoring Layer

Data‑Quality Monitoring

Real‑time monitoring of daily extraction volume, core model uniqueness, and metric thresholds ensures early detection of data problems.

Task‑Delay Monitoring

Alerts trigger when expected task windows (e.g., 02:00‑02:30) are missed, indicating abnormal delays.

On‑Call Layer

On‑call engineers must ensure all warehouse tasks run correctly overnight; unresolved issues must be escalated promptly.

Recruitment

The Zero technical team in Hangzhou (300+ engineers) is looking for talent across cloud‑native, blockchain, AI, low‑code, middleware, big data, engineering platforms, performance, and visualization. Interested candidates should contact zcy‑tc@cai‑inc.com.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Operations data modeling Data Warehouse Data Governance traffic domain

Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.