Comprehensive Overview of Data Warehouse Concepts, Architecture, and Modeling
This article provides an extensive introduction to data warehouses, covering their origins, development, definition, advantages, components, comparisons with databases, ODS and data marts, architectural approaches, modeling techniques, and dimensional modeling processes for enterprise‑level analytics.
1. Data Warehouse Overview
1.1. Origin of Data Warehouse
Before building a data warehouse, data is scattered across departmental systems, forming a complex web that hampers cross‑departmental analysis due to fragmented sources, lack of standards, inconsistent metrics, and poor data quality.
If an organization lets data evolve naturally without a unified plan, future problems include lack of trustworthiness, low productivity, and difficulty turning data into actionable information.
Data lacks credibility: inconsistent dimensions, algorithms, and sources.
Low productivity: manual report generation and custom extraction scripts.
Data cannot be easily transformed into information due to missing integration and history.
These issues justify the need for an enterprise‑level data warehouse.
1.2. Development of Data Warehouse
The concept emerged in the 1970s at MIT, separating transaction processing from analytical processing. IBM’s 1988 “Information Warehouse” proposal aimed to integrate enterprise data for quality assurance, though it remained promotional. In 1991 Bill Inmon published the first book “Building the Data Warehouse,” defining the concept and its implementation.
1.3. Definition and Characteristics
A data warehouse is a subject‑oriented, integrated, relatively stable collection of historical data that supports managerial decision‑making. It aggregates internal and external sources into a unified repository.
1.3.1. Characteristics
Subject‑oriented: data is organized by business subjects (e.g., customer, product). Integrated: data is cleansed, transformed, and coded uniformly. Stable: data is primarily read‑only, preserving history. Historical: timestamps allow trend analysis.
1.3.2. Advantages
Simplified information flow after integration.
Higher data reuse and sharing.
Single source of truth.
Standardized business view.
Data governance ensures quality.
1.3.3. Components
Various data sources.
ETL processes.
Operational and analytical data.
Subject models.
Data marts.
Reporting and EIS tools.
OLAP and data‑mining tools.
Metadata, data quality management, standardization, and publishing.
1.3.5. Project Characteristics
Data‑warehouse projects are integration‑focused, require continuous improvement, need close business‑IT collaboration, and demand persistent business involvement and effective management mechanisms.
1.4. Comparison with Other Systems
1.4.1. Data Warehouse vs. Database
Databases support high‑frequency transactional processing of current data, while data warehouses support low‑frequency analytical processing of large historical datasets, with different performance and update requirements.
1.4.2. Data Warehouse vs. ODS
An Operational Data Store (ODS) integrates near‑real‑time data for both OLTP and OLAP, serving as a staging area for the warehouse. ODS data is up‑to‑date and editable, whereas warehouse data is historical and read‑only.
1.4.3. Data Warehouse vs. Data Mart
Data marts are departmental, subject‑specific subsets of a warehouse, serving localized decision‑making, while a warehouse provides enterprise‑wide, integrated analytics.
2. Data Warehouse Architecture
2.1. Design Approaches
Three common strategies: top‑down (extensive upfront planning), bottom‑up (incremental development), and hybrid (combining both).
2.2. Architectural Debate
The Inmon “hub‑and‑spoke” model emphasizes a centralized, normalized warehouse, whereas the Kimball “bus” model builds a warehouse from integrated data marts using conformed dimensions.
2.3. Selection Guidance
Traditional, mature enterprises may prefer Inmon’s approach; fast‑growing, complex businesses often benefit from Kimball’s agile, data‑mart‑centric method.
2.4. Evolution in Practice
Many organizations start with Inmon’s layered architecture (DataSource → ODS → EDW → Data Mart → Applications) and later adopt a hybrid Inmon+Kimball model to balance integration and speed.
3. Data Warehouse Modeling
3.1. What Is a Data Model?
A data model defines entities, attributes, and relationships to represent business concepts, serving as a communication bridge between business and technical teams.
3.2. Why Modeling Matters
Modeling enables comprehensive business analysis, eliminates information silos, supports change, and guides project scope and timelines.
3.3. Modeling Stages
Business modeling, domain (subject‑area) modeling, logical modeling, and physical modeling, each adding detail and technical specificity.
3.4. Modeling Methods
Entity‑based, normalization (3NF) favored by Inmon, and dimensional (star‑schema) favored by Kimball, each with strengths and trade‑offs.
4. Dimensional Modeling
4.1. Techniques
Fact tables store measurable events; dimension tables store descriptive attributes. Design goals include simplicity, performance, and traceability.
4.2. Process
Select business process, define grain, choose dimensions, and determine facts.
4.3. Layered Implementation
Detail layer, aggregate layer, and data‑mart wide‑table layer, complemented by dimension and metadata tables.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
