Big Data 50 min read

Comprehensive Guide to Big Data Modeling and Data Warehouse Design

This article provides an in‑depth overview of big‑data modeling concepts, covering why data modeling is essential, relational versus analytical systems, common warehouse modeling methodologies, Alibaba's practical implementations, dimension design techniques, and detailed fact‑table design principles for modern data platforms.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Comprehensive Guide to Big Data Modeling and Data Warehouse Design

Chapter 1 introduces the need for data modeling in big‑data environments, emphasizing structured classification, cost reduction, efficiency improvement, and data quality, and contrasts OLTP (transaction‑oriented, 3NF) with OLAP (analysis‑oriented) systems.

It then surveys typical warehouse modeling approaches such as ER modeling, dimensional modeling (star and snowflake schemas), Data Vault, and Anchor models, highlighting their purposes and trade‑offs.

Chapter 2 describes Alibaba's data integration and management framework, detailing the layered architecture of ODS (operational data store), CDM (common dimension model) with DWD (detail) and DWS (summary) layers, and ADS (application data), together with principles of high cohesion, low coupling, cost‑performance balance, and naming consistency.

Chapter 3 focuses on dimension design, explaining basic concepts (facts vs dimensions, attributes, primary keys), design steps (selecting dimensions, defining granularity, identifying attributes), consistency and integration strategies, hierarchical and recursive dimensions, behavioral and multi‑value dimensions, and special cases such as micro‑dimensions.

Chapter 4 covers fact‑table fundamentals, classifying transaction, periodic snapshot, and cumulative snapshot fact tables, outlining design principles (granularity declaration, completeness, additivity, null handling, degenerated dimensions), and comparing single‑transaction versus multi‑transaction fact tables, including aggregation strategies, storage considerations, and implementation patterns used at Alibaba.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ETLdimensional modeling
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.