Big Data 13 min read

Understanding Data Warehouse Architecture and Layered Design

This article explains the concepts, architecture, and layered design of data warehouses, covering data flow, ETL processes, ODS, DWD, DWM, DWS, ADS layers, their characteristics, differences from databases, and the role of data marts in supporting OLAP and decision‑making.

Architecture Digest

Dec 1, 2022

Understanding Data Warehouse Architecture and Layered Design

1. Data Flow

The data flow diagram illustrates how raw data moves through extraction, transformation, and loading stages before reaching the warehouse.

2. Application Example

Visual examples show typical data flow scenarios and potential pitfalls such as tangled dependencies.

3. What Is a Data Warehouse (DW)

A data warehouse (DW or DWH) is a comprehensive theory system that includes ETL, scheduling, and modeling, built on top of existing databases to support OLAP and decision‑making.

Its purpose is to provide a clean, integrated, subject‑oriented data source for analysis, not to serve as the final destination of raw data.

Main Characteristics

Subject‑oriented: data is organized by business subjects rather than transaction‑oriented tables.

Integrated: source data is cleansed and unified to ensure consistent enterprise‑wide information.

Read‑only: data reflects a snapshot of source systems and is primarily used for querying.

Time‑variant: each record carries a time attribute to support historical analysis.

Comparison with Operational Databases

DW: designed for analytical queries, handling large volumes to reveal trends.

Operational DB: optimized for transaction processing and data capture.

4. Why Layer the Warehouse

Layering addresses data quality, metadata management, and provides clear responsibilities for each stage.

Data Layers

ODS (Operation Data Store) : the raw data preparation zone where extracted data is first stored with minimal cleaning.

DWD (Data Detail Layer) : cleanses and normalizes ODS data, handling nulls, dirty data, and outliers.

DWM (Data Middle Layer) : performs light aggregations on DWD data to create reusable intermediate tables.

DWS (Data Service Layer) : builds wide tables (e.g., user behavior) for downstream analytics and OLAP.

ADS (Application Data Service) : serves final reports and analytical results, often stored in ES, Redis, PostgreSQL, Hive, or Druid.

Fact and Dimension Tables

Fact tables store large volumes of transactional records, while dimension tables hold attribute data, often organized in star or snowflake schemas.

5. Data Marts

Data marts are departmental subsets of the warehouse, focused on specific subjects and delivered as pre‑computed cubes for fast access.

6. Q&A Summary

Key questions address the differences between ODS and DWD, the purpose of each layer, and where to store wide tables (often in the APP layer).

7. Appendix

ETL: Extract‑Transform‑Load process.

Wide Table: a denormalized table with many columns, improving query performance at the cost of redundancy.

Subject: a high‑level business domain used for organizing warehouse data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Analytics Big Data data modeling Data Warehouse ETL Data Layers

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.