Big Data 15 min read

Mastering Data Warehouse Modeling: From ER to Data Vault

This article explains what a data warehouse is, why modeling it matters, and compares four major modeling approaches—ER, dimensional, Data Vault, and Anchor—detailing their structures, steps, advantages, and typical use cases, while also offering guidance on selecting tools and designing models.

dbaplus Community
dbaplus Community
dbaplus Community
Mastering Data Warehouse Modeling: From ER to Data Vault

Data Warehouse Definition

A data warehouse (DW or DWH) is a subject‑oriented, integrated, time‑variant, non‑volatile collection of data that supports enterprise‑wide decision making. The definition originates from Bill Inmon’s 1991 book Building the Data Warehouse .

Why Model a Data Warehouse?

Access performance : optimized query paths and reduced I/O.

Data cost : eliminate unnecessary redundancy, enable result reuse, lower storage and compute expenses.

Usage efficiency : improve user experience and data‑driven workflows.

Data quality : enforce consistent business metrics and reduce calculation errors.

Common Modeling Approaches

Four major families have evolved from traditional relational normalization:

Normalized (Entity‑Relationship) modeling

Dimensional modeling (Kimball)

Data Vault modeling

Anchor modeling

1. Normalized (ER) Modeling

Based on entities, attributes, and relationships, the ER model satisfies 3NF and is the foundation of OLTP systems. Typical steps:

Identify business subjects (e.g., Teacher, Course).

Define relationships between subjects (many‑to‑many, one‑to‑many).

List attributes for each subject.

Draw an ER diagram.

2. Dimensional Modeling

Proposed by Ralph Kimball, dimensional modeling builds a star‑oriented schema composed of fact tables and dimension tables.

Fact tables store measurable events. Each fact table has a single grain (the most atomic level of analysis). Common grain types:

Transaction fact – one row per transactional event (e.g., order line).

Periodic snapshot – one row per entity per time period (e.g., monthly account balance).

Accumulating snapshot – one row per process lifecycle (e.g., order fulfillment).

Dimension tables describe the axes of analysis. Typical dimension types:

Degenerate dimension – simple attributes stored directly in the fact table (e.g., order number).

Slowly Changing Dimension (SCD) – attributes that change infrequently; handled with Type 1 (overwrite), Type 2 (historical row), or Type 3 (add column) strategies.

Example : An e‑commerce order model with fact tables Fact_Order and Fact_OrderLine, and dimensions Dim_Product, Dim_User, Dim_Merchant, Dim_Region, Dim_Time.

3. Data Vault Modeling

Created by Dan Linstedt, Data Vault extends the ER model with three core structures designed for scalability and auditability:

Hub : stores unique business keys (e.g., Customer_ID).

Link : captures many‑to‑many relationships between hubs (e.g., Customer‑Order link).

Satellite : holds descriptive, historical attributes for hubs or links (e.g., customer address changes).

Modeling workflow:

Identify all core business entities.

Define each entity with inbound relationships as a Hub.

Define relationship tables as Links.

Attach Satellites to store attributes and history.

4. Anchor Modeling

Anchor modeling refines Data Vault to achieve 6NF by representing each attribute as a separate key‑value row. Extensions are added without altering existing tables, resulting in a highly normalized schema that often requires many joins, limiting practical adoption.

Choosing a Schema Layout

Within dimensional modeling, three common layouts are used:

Star schema : a central fact table surrounded by denormalized dimension tables; optimized for query performance.

Snowflake schema : dimensions are further normalized into sub‑dimensions; reduces redundancy but adds join overhead.

Galaxy (constellation) schema : multiple fact tables share common dimensions; useful for complex enterprise data marts.

Typical Modeling Process

Select a business process (e.g., order management).

Declare the grain – the smallest unit of analysis (e.g., one order line).

Identify required dimensions (time, product, customer, etc.).

Define fact tables that capture measures at the declared grain.

Tooling

Common enterprise modeling tools include Erwin, PowerDesigner, Microsoft Visio, and spreadsheet‑based approaches. Some organizations develop custom tooling or adopt vendor suites (e.g., Alibaba’s data‑platform components).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData Warehousedimensional modelingData Vault
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.