Data Warehouse Modeling Platform: Exploration and Practice at NetEase Yanxuan
This article details NetEase Yanxuan’s exploration and practice of a data warehouse modeling platform, covering background, current challenges, a comprehensive solution, step‑by‑step implementation, and the resulting improvements in model standardization, automation, and business value.
Introduction
The presentation shares the exploration and practice of a data warehouse modeling platform at NetEase Yanxuan, organized by senior big data platform engineer Pan Songdu, with editorial support from Jin Yong and content verification by Li Yao.
1. Background and Current Situation
Data warehouse construction integrates, transforms, and computes business data to extract valuable information for continuous business empowerment. The typical model follows a layered design: ODS (source layer), DWD (fact layer), DWS detail layer, DWS summary layer, and DM (data mart) layer. NetEase Yanxuan’s data warehouse has been built for six‑seven years, containing over ten thousand physical models, nearly ten thousand development tasks, and a similar number of metrics serving dozens of products.
The existing ecosystem consists of separate products for metric definition, model development, and data services, leading to a lack of a unified system and several issues in the old metric management system.
Problems of the Old Metric Management System
Limited functional scope – no support for DW‑layer model design, physicalization, or integration with task operations and data services.
Unstandardized metric definitions – duplicate metrics, inconsistent naming, and mixing of atomic and derived metrics.
Unstandardized model design – unclear naming hierarchy, low online rate, and poor registration of models.
Unstandardized model construction – mismatched logical and physical structures, siloed development, and cross‑layer dependency issues.
4. Pain Points of Data Warehouse Construction
Complex and rapidly changing business scenarios leading to massive and intricate data relationships.
Insufficient pre‑planning and weak model design capability, causing duplicate and poorly described metrics.
Disconnected design and implementation processes, resulting in low development efficiency and error‑prone manual coding.
Difficult post‑implementation governance due to lack of proper change tracking and data quality assurance.
Data silos caused by inconsistent metric definitions and divergent perspectives between developers and downstream users.
2. Solution
The solution aims to establish a complete and standardized data warehouse modeling system by defining a standard, delivering a product, and enforcing a set of norms. The product integrates business process management, dimension management, metric definition, and model design, and enforces standards such as metric positioning, data access constraints, model governance, and task operation.
The platform follows classic data warehouse methodology, adapts to Yanxuan’s current state, benchmarks industry‑leading products, and leverages other big‑data capabilities to cover the entire lifecycle from design to maintenance.
Product Framework
The framework consists of five major blocks: data planning, data standards, dimension modeling, data metrics, and data assets. Dimension modeling and data metrics form the core, while planning and standards provide customization for Yanxuan’s context.
Functional Modules
Business process and metric definition – includes process definition, bus matrix design, dimension management, atomic and derived metric management.
Logical model design – builds layered models (ODS, DWD, DWS detail, DWS summary, DM) with clear responsibilities.
Physical model construction and deployment – covers model physicalization, task publishing, operation, and data service generation.
3. Implementation Steps
Step 1 – Standardize Metric Definition System
Introduce a five‑step workflow: domain segmentation, dimension design, business process design, atomic and derived metric design, and final derived metric generation with automatic dependency linking.
Step 2 – Standardize Model Design System
Define a strict model design process covering DWD fact model, DWS detail model (with automatic identifier generation and metric linking), DWS summary model (auto‑discovering dependencies), and DM data‑mart layer (read‑only for consumption).
Step 3 – Standardize Metric Calculation and Model Construction
Automate the construction of DWS summary and DM layers, generate aggregation code based on metric definitions, and ensure end‑to‑end traceability and consistency.
4. Results and Summary
The platform has fully realized standardized metric and model definition as well as automated model construction. Incremental metrics are now completely standardized, and legacy metrics are being migrated. Automation has significantly boosted development efficiency and business delivery speed.
The primary business value lies in solving metric inconsistency, providing a unified view for design, production, usage, and governance, thereby achieving cost reduction and efficiency gains. Balancing strict standards with flexibility remains a key focus to avoid over‑constraining requirements while preventing system decay.
Thank you for your attention.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
