Big Data 11 min read

Data Warehouse Modeling Architecture and Methodology: Building Robust, High‑Quality Data Models

This article explains the importance of data‑warehouse modeling, outlines a layered architecture (DWD, DWS, DIM, ADS), describes a systematic modeling process, and presents design principles and practical examples to achieve high‑quality, stable, and efficient data models for large‑scale analytics.

DataFunTalk

Mar 11, 2021

Data Warehouse Modeling Architecture and Methodology: Building Robust, High‑Quality Data Models

Data‑warehouse modeling is the core of building a reliable data platform, influencing data quality, robustness, resource consumption, and service response speed.

Why model? Real‑world experience shows that naive table‑by‑table development leads to model bloat, long runtimes, KPI delays, data bugs, duplicate metrics, inconsistent calculations, and operational headaches.

Effective modeling treats the warehouse as a system with both functional (business needs, metrics) and non‑functional requirements (quality, security, performance, stability, resource efficiency).

1. Model Architecture

The recommended layered architecture includes:

DWD – detailed fact data, cleaning and standardization.

DWS – analytical subject‑oriented models, with light, medium, and heavy aggregation levels.

DIM – dimension tables for all entities (users, content, etc.).

ADS – data‑mart layer built on DWS and DIM for application‑specific needs.

After defining horizontal layers, vertical data domains are mapped to business lines, considering domain characteristics, business complexity, and KPI attributes.

2. Modeling Process

Once the overall tone is set, individual model design follows a defined workflow (see diagram).

3. Design Methodology

Modeling focuses on fact tables and dimension tables. Key principles include high cohesion, low coupling, separation of core and extension models, centralizing common logic, balancing cost and performance, ensuring data rollback, consistency, and clear naming.

Fact Table Design

Select business processes and fact table type (single‑transaction vs. multi‑transaction).

Define grain at the most atomic level.

Choose appropriate dimensions and facts, aligning grain with facts.

Consider redundant dimensions for usability, balancing query speed, storage, and frequency.

Dimension Design

Identify and create dimensions, designate primary and related dimension tables.

Extract rich attributes, include descriptive text, differentiate numeric from factual attributes, and apply de‑normalization and flattening.

4. Example

An example from NetEase Cloud Music shows a content‑line model with a main dimension table, light aggregation tables for interactions and traffic, and separate heavy aggregation tables for historical metrics, illustrating the layered approach.

5. Summary

The described architecture, process, and methodology have become a stable internal standard, further productized by NetEase’s data‑warehouse platform to help teams build robust data‑mid‑platforms efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Modeling Data Warehouse ETL data architecture DWD

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.