Big Data 11 min read

Data Warehouse Modeling Architecture and Methodology: Building Robust, High‑Quality Data Models

This article explains the importance of data‑warehouse modeling, outlines a layered architecture (DWD, DWS, DIM, ADS), describes a systematic modeling process, and presents design principles and practical examples to achieve high‑quality, stable, and efficient data models for large‑scale analytics.

DataFunTalk
DataFunTalk
DataFunTalk
Data Warehouse Modeling Architecture and Methodology: Building Robust, High‑Quality Data Models

Data‑warehouse modeling is the core of building a reliable data platform, influencing data quality, robustness, resource consumption, and service response speed.

Why model? Real‑world experience shows that naive table‑by‑table development leads to model bloat, long runtimes, KPI delays, data bugs, duplicate metrics, inconsistent calculations, and operational headaches.

Effective modeling treats the warehouse as a system with both functional (business needs, metrics) and non‑functional requirements (quality, security, performance, stability, resource efficiency).

1. Model Architecture

The recommended layered architecture includes:

DWD – detailed fact data, cleaning and standardization.

DWS – analytical subject‑oriented models, with light, medium, and heavy aggregation levels.

DIM – dimension tables for all entities (users, content, etc.).

ADS – data‑mart layer built on DWS and DIM for application‑specific needs.

After defining horizontal layers, vertical data domains are mapped to business lines, considering domain characteristics, business complexity, and KPI attributes.

2. Modeling Process

Once the overall tone is set, individual model design follows a defined workflow (see diagram).

3. Design Methodology

Modeling focuses on fact tables and dimension tables. Key principles include high cohesion, low coupling, separation of core and extension models, centralizing common logic, balancing cost and performance, ensuring data rollback, consistency, and clear naming.

Fact Table Design

Select business processes and fact table type (single‑transaction vs. multi‑transaction).

Define grain at the most atomic level.

Choose appropriate dimensions and facts, aligning grain with facts.

Consider redundant dimensions for usability, balancing query speed, storage, and frequency.

Dimension Design

Identify and create dimensions, designate primary and related dimension tables.

Extract rich attributes, include descriptive text, differentiate numeric from factual attributes, and apply de‑normalization and flattening.

4. Example

An example from NetEase Cloud Music shows a content‑line model with a main dimension table, light aggregation tables for interactions and traffic, and separate heavy aggregation tables for historical metrics, illustrating the layered approach.

5. Summary

The described architecture, process, and methodology have become a stable internal standard, further productized by NetEase’s data‑warehouse platform to help teams build robust data‑mid‑platforms efficiently.

ModelingData WarehouseETLdata architectureDWD
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.