Dimensional Modeling Techniques for Data Warehouses: Star, Snowflake, and Constellation Schemas
This article explains data‑warehouse dimensional modeling, covering the basic concepts of dimension and fact tables, the three main schema patterns—star, snowflake, and constellation—along with their advantages, implementation steps, and handling of slowly changing dimensions.
Preface
Data‑warehouse modeling includes several techniques; besides ER and relational modeling, it also involves dimensional modeling specifically designed for analytical databases.
The article introduces dimensional modeling and focuses on three overall modeling systems based on ER/relational/dimensional approaches: normalized data warehouses, dimensional‑model data warehouses, and independent data marts.
Basic Concepts of Dimensional Modeling
Dimensional modeling is a method for designing analytical databases, data warehouses, and data marts. It is a type of relational modeling that adds two concepts: dimension tables (describing the type of analysis subject) and fact tables (storing measurable metrics).
Dimension tables contain relatively static, low‑volume descriptive data, while fact tables store large‑volume numeric measures linked to dimension tables via foreign keys.
Note: In a data warehouse, strict normalization is not required because there are no update operations.
Three Dimensional Modeling Patterns
1. Star Schema – The most common pattern, consisting of a central fact table surrounded by multiple dimension tables that only relate to the fact table.
Characteristics: dimension tables do not join with each other; each dimension’s primary key is a single column placed as a foreign key in the fact table; the layout resembles a star.
2. Snowflake Schema – An extension of the star schema where dimension tables are further normalized into sub‑dimension tables, resulting in a more complex, multi‑level structure.
While it satisfies normalization, it is rarely used in practice because it increases development complexity without significant benefit in a data‑warehouse context.
3. Constellation Schema (Fact Constellations) – Allows multiple fact tables to share dimension tables, supporting scenarios where several business processes need to be analyzed together.
Example: Dimensional Modeling for a Retail Sales Subject
The example starts with an ER diagram of a retail company, maps it to relational tables, and then builds a star‑schema model. Key steps include identifying useful dimensions (product, customer, store, date), determining how to generate dimension tables from existing relations, selecting metrics (sales quantity and amount), and constructing the fact table.
Issues such as non‑normalized dimension and fact tables are acceptable in a data‑warehouse environment, and surrogate keys are introduced to handle slowly changing dimensions.
Additional Fact Attributes
Fact tables often include a transaction identifier (TID) and transaction time. These are placed in the fact table as degenerate dimensions to avoid large joins and to support partitioning in distributed warehouses like Hive or Spark SQL.
Classic Constellation Model
Sharing dimensions enables adding new fact tables (e.g., for defective products) without redesigning the model. Detailed fact tables store individual events, while aggregated fact tables store pre‑summarized data; both can coexist in a constellation model.
Slowly Changing Dimensions
To preserve historical data when dimension attributes change, surrogate keys combined with timestamps and row indicators are used. This allows querying by current, historical, or specific attribute values.
Data‑Warehouse Modeling Systems
Normalized Data Warehouse – Builds a fully normalized analytical database first, then creates dimensional data marts for each department.
Dimensional‑Model Data Warehouse – Directly designs a large constellation model using conformed dimensions, providing a unified analytical layer.
Independent Data Marts – Each business unit builds and maintains its own ETL and data mart, which can lead to data silos and duplicated effort.
A comparison table highlights the trade‑offs between normalized and dimensional approaches, emphasizing factors such as development effort, agility, data quality, and maintainability.
Conclusion
Data‑warehouse modeling is a comprehensive discipline that combines ER, relational, and dimensional modeling techniques; successful projects require both strong technical expertise and deep business understanding.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
