Mastering Data Warehouse Modeling: From ER to Data Vault
This article explains what a data warehouse is, why modeling it matters, and compares four major modeling approaches—ER, dimensional, Data Vault, and Anchor—detailing their structures, steps, advantages, and typical use cases, while also offering guidance on selecting tools and designing models.
Data Warehouse Definition
A data warehouse (DW or DWH) is a subject‑oriented, integrated, time‑variant, non‑volatile collection of data that supports enterprise‑wide decision making. The definition originates from Bill Inmon’s 1991 book Building the Data Warehouse .
Why Model a Data Warehouse?
Access performance : optimized query paths and reduced I/O.
Data cost : eliminate unnecessary redundancy, enable result reuse, lower storage and compute expenses.
Usage efficiency : improve user experience and data‑driven workflows.
Data quality : enforce consistent business metrics and reduce calculation errors.
Common Modeling Approaches
Four major families have evolved from traditional relational normalization:
Normalized (Entity‑Relationship) modeling
Dimensional modeling (Kimball)
Data Vault modeling
Anchor modeling
1. Normalized (ER) Modeling
Based on entities, attributes, and relationships, the ER model satisfies 3NF and is the foundation of OLTP systems. Typical steps:
Identify business subjects (e.g., Teacher, Course).
Define relationships between subjects (many‑to‑many, one‑to‑many).
List attributes for each subject.
Draw an ER diagram.
2. Dimensional Modeling
Proposed by Ralph Kimball, dimensional modeling builds a star‑oriented schema composed of fact tables and dimension tables.
Fact tables store measurable events. Each fact table has a single grain (the most atomic level of analysis). Common grain types:
Transaction fact – one row per transactional event (e.g., order line).
Periodic snapshot – one row per entity per time period (e.g., monthly account balance).
Accumulating snapshot – one row per process lifecycle (e.g., order fulfillment).
Dimension tables describe the axes of analysis. Typical dimension types:
Degenerate dimension – simple attributes stored directly in the fact table (e.g., order number).
Slowly Changing Dimension (SCD) – attributes that change infrequently; handled with Type 1 (overwrite), Type 2 (historical row), or Type 3 (add column) strategies.
Example : An e‑commerce order model with fact tables Fact_Order and Fact_OrderLine, and dimensions Dim_Product, Dim_User, Dim_Merchant, Dim_Region, Dim_Time.
3. Data Vault Modeling
Created by Dan Linstedt, Data Vault extends the ER model with three core structures designed for scalability and auditability:
Hub : stores unique business keys (e.g., Customer_ID).
Link : captures many‑to‑many relationships between hubs (e.g., Customer‑Order link).
Satellite : holds descriptive, historical attributes for hubs or links (e.g., customer address changes).
Modeling workflow:
Identify all core business entities.
Define each entity with inbound relationships as a Hub.
Define relationship tables as Links.
Attach Satellites to store attributes and history.
4. Anchor Modeling
Anchor modeling refines Data Vault to achieve 6NF by representing each attribute as a separate key‑value row. Extensions are added without altering existing tables, resulting in a highly normalized schema that often requires many joins, limiting practical adoption.
Choosing a Schema Layout
Within dimensional modeling, three common layouts are used:
Star schema : a central fact table surrounded by denormalized dimension tables; optimized for query performance.
Snowflake schema : dimensions are further normalized into sub‑dimensions; reduces redundancy but adds join overhead.
Galaxy (constellation) schema : multiple fact tables share common dimensions; useful for complex enterprise data marts.
Typical Modeling Process
Select a business process (e.g., order management).
Declare the grain – the smallest unit of analysis (e.g., one order line).
Identify required dimensions (time, product, customer, etc.).
Define fact tables that capture measures at the declared grain.
Tooling
Common enterprise modeling tools include Erwin, PowerDesigner, Microsoft Visio, and spreadsheet‑based approaches. Some organizations develop custom tooling or adopt vendor suites (e.g., Alibaba’s data‑platform components).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
