Big Data 12 min read

Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework

This article outlines the Yanxuan data warehouse’s layered architecture, the offline and real‑time development platforms, the comprehensive standards for metric definition, model design, and SQL development, and proposes a six‑dimensional evaluation system covering data norms, security, quality, stability, continuous improvement, and development efficiency.

DataFunTalk
DataFunTalk
DataFunTalk
Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework

Introduction – In the era of data‑driven decision making, Yanxuan’s data volume has grown from dozens of gigabytes to petabytes, prompting the evolution of data engineers from simple ETL roles to full‑stack data professionals handling collection, synchronization, modeling, offline and real‑time processing, governance, and product interaction.

Data Warehouse Architecture – The warehouse follows a three‑layer logical model: ODS (Operational Data Store) retains raw source data; DW consists of DWD (detail layer) and DWS (summary layer) forming the middle layer; DM (Data Mart) serves as the application layer for product and analyst consumption. The layers are illustrated in the diagram below.

The ODS layer is internal‑only, synchronizing business system data via DataHub parsing of binlogs, primarily using full‑load. DWD provides public‑facing detailed tables with common dimensions and wide tables to reduce joins. DWS aggregates core metrics and is the most critical data asset. DIM stores reusable dimension tables (e.g., product, SKU, channel). DM offers application‑level aggregates for reporting and product use.

ODS: internal sync, raw format, mainly full load.

DWD: public detailed layer, wide tables for common joins.

DWS: public summary layer, core metrics.

DIM: public dimension tables.

DM: application layer for product and analyst queries.

Development Platform – Yanxuan separates offline and real‑time processing. The offline side is powered by Mammoth , a one‑stop data management and application development platform from NetEase Hangzhou Research Institute, covering large‑scale storage, computation, integration, and governance. The real‑time side uses the internally built Atom platform for streaming data management and development.

Warehouse Standards – Yanxuan follows a methodology based on dimensional modeling and Alibaba’s OneData theory, encapsulated in three core specifications: Metric Definition Specification , Model Design Specification , and Data Development Specification . Supporting tools (Cangjie metric management, SuiRen metric map, UDS data quality, EasyDesign model design) enforce these standards.

1. Metric Definition Specification – Unifies metric naming and derivation rules to avoid ambiguity and reduce maintenance cost.

2. Model Design Specification – Standardizes model naming using domain + update method or domain + dimension + update cycle.

3. Data Development Specification – Improves SQL quality by enforcing indentation, sub‑query formatting, and other coding conventions.

Evaluation System – A six‑dimensional framework assesses the warehouse on: 1) Data Norms, 2) Data Security, 3) Data Quality, 4) Data Stability, 5) Continuous Construction Mechanism, 6) Data Development Efficiency. Each dimension is measured with concrete metrics (e.g., ODS cross‑layer dependency rate, baseline completion rate, fault counts, automation level of standards).

Evaluation Practice – Specific tools support each dimension: EasyDesign for metric and model standards, EasyTaskOps for baseline management and alerting, and automated quality checks. The article lists concrete indicators such as ODS cross‑layer dependency counts, baseline completion rates (>90%), and storage savings (1.2 PB from deprecating 34 k tables).

Continuous Improvement – Regular exchanges with analysts, automated governance via Mammoth, and iterative upgrades (e.g., EasyCost) ensure the warehouse stays aligned with business growth.

Summary – Over the past year Yanxuan has established comprehensive standards, built robust offline and real‑time platforms, and defined a multi‑facet evaluation system that drives data quality, security, stability, and development efficiency, positioning the warehouse for richer data, easier usage, and stronger guarantees in the coming quarters.

Author Bio – Yi Feng, senior data architect with extensive experience in data modeling, standards enforcement, and leading the Yanxuan transaction domain.

data engineeringBig Datadata warehousedata governanceSQL standards
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.