Big Data 47 min read

Comprehensive Overview of Data Warehouse Concepts, Architecture, and Modeling

This article provides an extensive introduction to data warehouses, covering their origins, development, definition, advantages, components, comparisons with databases, ODS and data marts, architectural approaches, modeling techniques, and dimensional modeling processes for enterprise‑level analytics.

Big Data Technology & Architecture

Nov 4, 2020

Comprehensive Overview of Data Warehouse Concepts, Architecture, and Modeling

1. Data Warehouse Overview

1.1. Origin of Data Warehouse

Before building a data warehouse, data is scattered across departmental systems, forming a complex web that hampers cross‑departmental analysis due to fragmented sources, lack of standards, inconsistent metrics, and poor data quality.

If an organization lets data evolve naturally without a unified plan, future problems include lack of trustworthiness, low productivity, and difficulty turning data into actionable information.

Data lacks credibility: inconsistent dimensions, algorithms, and sources.

Low productivity: manual report generation and custom extraction scripts.

Data cannot be easily transformed into information due to missing integration and history.

These issues justify the need for an enterprise‑level data warehouse.

1.2. Development of Data Warehouse

The concept emerged in the 1970s at MIT, separating transaction processing from analytical processing. IBM’s 1988 “Information Warehouse” proposal aimed to integrate enterprise data for quality assurance, though it remained promotional. In 1991 Bill Inmon published the first book “Building the Data Warehouse,” defining the concept and its implementation.

1.3. Definition and Characteristics

A data warehouse is a subject‑oriented, integrated, relatively stable collection of historical data that supports managerial decision‑making. It aggregates internal and external sources into a unified repository.

1.3.1. Characteristics

Subject‑oriented: data is organized by business subjects (e.g., customer, product). Integrated: data is cleansed, transformed, and coded uniformly. Stable: data is primarily read‑only, preserving history. Historical: timestamps allow trend analysis.

1.3.2. Advantages

Simplified information flow after integration.

Higher data reuse and sharing.

Single source of truth.

Standardized business view.

Data governance ensures quality.

1.3.3. Components

Various data sources.

ETL processes.

Operational and analytical data.

Subject models.

Data marts.

Reporting and EIS tools.

OLAP and data‑mining tools.

Metadata, data quality management, standardization, and publishing.

1.3.5. Project Characteristics

Data‑warehouse projects are integration‑focused, require continuous improvement, need close business‑IT collaboration, and demand persistent business involvement and effective management mechanisms.

1.4. Comparison with Other Systems

1.4.1. Data Warehouse vs. Database

Databases support high‑frequency transactional processing of current data, while data warehouses support low‑frequency analytical processing of large historical datasets, with different performance and update requirements.

1.4.2. Data Warehouse vs. ODS

An Operational Data Store (ODS) integrates near‑real‑time data for both OLTP and OLAP, serving as a staging area for the warehouse. ODS data is up‑to‑date and editable, whereas warehouse data is historical and read‑only.

1.4.3. Data Warehouse vs. Data Mart

Data marts are departmental, subject‑specific subsets of a warehouse, serving localized decision‑making, while a warehouse provides enterprise‑wide, integrated analytics.

2. Data Warehouse Architecture

2.1. Design Approaches

Three common strategies: top‑down (extensive upfront planning), bottom‑up (incremental development), and hybrid (combining both).

2.2. Architectural Debate

The Inmon “hub‑and‑spoke” model emphasizes a centralized, normalized warehouse, whereas the Kimball “bus” model builds a warehouse from integrated data marts using conformed dimensions.

2.3. Selection Guidance

Traditional, mature enterprises may prefer Inmon’s approach; fast‑growing, complex businesses often benefit from Kimball’s agile, data‑mart‑centric method.

2.4. Evolution in Practice

Many organizations start with Inmon’s layered architecture (DataSource → ODS → EDW → Data Mart → Applications) and later adopt a hybrid Inmon+Kimball model to balance integration and speed.

3. Data Warehouse Modeling

3.1. What Is a Data Model?

A data model defines entities, attributes, and relationships to represent business concepts, serving as a communication bridge between business and technical teams.

3.2. Why Modeling Matters

Modeling enables comprehensive business analysis, eliminates information silos, supports change, and guides project scope and timelines.

3.3. Modeling Stages

Business modeling, domain (subject‑area) modeling, logical modeling, and physical modeling, each adding detail and technical specificity.

3.4. Modeling Methods

Entity‑based, normalization (3NF) favored by Inmon, and dimensional (star‑schema) favored by Kimball, each with strengths and trade‑offs.

4. Dimensional Modeling

4.1. Techniques

Fact tables store measurable events; dimension tables store descriptive attributes. Design goals include simplicity, performance, and traceability.

4.2. Process

Select business process, define grain, choose dimensions, and determine facts.

4.3. Layered Implementation

Detail layer, aggregate layer, and data‑mart wide‑table layer, complemented by dimension and metadata tables.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Warehouse ETL dimensional modeling Kimball Inmon

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.