Yanxuan’s Data Warehouse Blueprint: Architecture, Standards, and Evaluation
This article introduces Yanxuan’s data warehouse concept, platform layers, development standards, and a comprehensive evaluation framework, detailing its multi‑layer architecture (ODS, DWD, DWS, DIM, DM), supporting offline and real‑time platforms, and six key assessment dimensions such as data quality, security, and development efficiency.
Data warehouses are intangible products for data engineers, with evaluation criteria distinct from visualization or interactive products. This article presents Yanxuan’s data warehouse concept, platform, and evaluation system.
Data Warehouse Basic Architecture
Yanxuan’s warehouse follows a layered logic. The overall framework is shown below:
The layers are divided by business data flow into three main tiers: ODS (Operational Data Store), DW (including DWD and DWS), and DM (Data Mart).
ODS Layer (Operational Data Store) : Not exposed externally; synchronizes raw business system data to the warehouse, preserving original formats, primarily via DataHub parsing binlog with full‑load sync.
DWD Layer (Detail Layer) : Exposed externally; stores common logic and frequently used dimension attributes, creating wide tables to reduce joins.
DWS Layer (Summary Layer) : Exposed externally; contains core public metrics and serves as the main data asset for external use.
DIM Layer (Dimension Tables) : Exposed externally; includes common dimension tables such as product, SKU, and channel.
DM Layer (Application Layer) : Exposed to products; supports data products and reports, aggregating complex metrics.
Data Warehouse Development Platform
Yanxuan’s warehouse consists of offline and real‑time components.
Offline processing is supported by Mammoth, a one‑stop data management and application development platform from NetEase Hangzhou Research Institute.
Real‑time processing is provided by the Atom platform, a self‑developed real‑time data management and development solution.
Yanxuan Data Warehouse Standards
Although data warehouses are often seen as low‑entry‑barrier SQL work, Yanxuan follows a rigorous methodology comprising three specifications: Metric Definition, Model Design, and Data Development, supported by tools such as Cangjie (metric management), SuiRen (metric map), UDS (data quality), and EasyDesign (model design).
Data Warehouse Evaluation System
Core requirements emphasize data security and data quality as the warehouse’s lifeline.
1. Data Specification
Improves overall development quality by enforcing the three Yanxuan specifications and monitoring their implementation.
2. Data Security
Adheres to NetEase Business Conduct Guidelines, preventing external data leaks and ensuring secure handling.
3. Data Quality
Consists of data‑intrinsic quality (measured by fault levels and frequency) and construction quality (usability and richness of core assets).
4. Data Stability
Ensures both warehouse and platform stability through duty rosters, integrated incident platforms, and regular reviews.
5. Continuous Construction Mechanism
Maintains vitality via regular analyst‑driven metric updates and governance that removes non‑standard models, saving storage.
6. Data Development Efficiency
Measured by automation of development standards and platform user experience; recent projects reduced iteration cost dramatically.
Yanxuan Data Warehouse Evaluation Practice
1. Data Specification
Implemented via the EasyDesign platform, which automates metric definition and model design, supporting over 200 new DW tables in six months.
2. Data Security
Addresses data‑related losses through compliant release processes, testing tools, and environments.
3. Data Stability
EasyTaskOps provides intelligent baseline alerts and fine‑grained operations; baseline completion rates exceed 90%.
4. Continuous Construction Mechanism
Through the EasyCost upgrade, rule‑based storage, governance, and compute optimizations saved 1.2 PB of storage.
5. Data Development Quality
EasyDesign ensures compliance, leading to over 200 new DW tables built with high quality.
6. Data Development Efficiency
Standardized processes and platform support dramatically reduced iteration and repair costs for offline and real‑time data validation.
Conclusion
Yanxuan’s data warehouse has accumulated extensive experience across six dimensions, from product to practice. The team completed data standards and SOPs in Q3 2019, advanced product iterations in early 2020, and expects richer data, easier usage, stronger guarantees, and faster response in the latter half of the year.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
