Understanding Data Warehouse Architecture and Layered Design
This article explains the concepts, architecture, and layered design of data warehouses, covering data flow, ETL processes, ODS, DWD, DWM, DWS, ADS layers, their characteristics, differences from databases, and the role of data marts in supporting OLAP and decision‑making.
1. Data Flow
The data flow diagram illustrates how raw data moves through extraction, transformation, and loading stages before reaching the warehouse.
2. Application Example
Visual examples show typical data flow scenarios and potential pitfalls such as tangled dependencies.
3. What Is a Data Warehouse (DW)
A data warehouse (DW or DWH) is a comprehensive theory system that includes ETL, scheduling, and modeling, built on top of existing databases to support OLAP and decision‑making.
Its purpose is to provide a clean, integrated, subject‑oriented data source for analysis, not to serve as the final destination of raw data.
Main Characteristics
Subject‑oriented: data is organized by business subjects rather than transaction‑oriented tables.
Integrated: source data is cleansed and unified to ensure consistent enterprise‑wide information.
Read‑only: data reflects a snapshot of source systems and is primarily used for querying.
Time‑variant: each record carries a time attribute to support historical analysis.
Comparison with Operational Databases
DW: designed for analytical queries, handling large volumes to reveal trends.
Operational DB: optimized for transaction processing and data capture.
4. Why Layer the Warehouse
Layering addresses data quality, metadata management, and provides clear responsibilities for each stage.
Data Layers
ODS (Operation Data Store) : the raw data preparation zone where extracted data is first stored with minimal cleaning.
DWD (Data Detail Layer) : cleanses and normalizes ODS data, handling nulls, dirty data, and outliers.
DWM (Data Middle Layer) : performs light aggregations on DWD data to create reusable intermediate tables.
DWS (Data Service Layer) : builds wide tables (e.g., user behavior) for downstream analytics and OLAP.
ADS (Application Data Service) : serves final reports and analytical results, often stored in ES, Redis, PostgreSQL, Hive, or Druid.
Fact and Dimension Tables
Fact tables store large volumes of transactional records, while dimension tables hold attribute data, often organized in star or snowflake schemas.
5. Data Marts
Data marts are departmental subsets of the warehouse, focused on specific subjects and delivered as pre‑computed cubes for fast access.
6. Q&A Summary
Key questions address the differences between ODS and DWD, the purpose of each layer, and where to store wide tables (often in the APP layer).
7. Appendix
ETL: Extract‑Transform‑Load process.
Wide Table: a denormalized table with many columns, improving query performance at the cost of redundancy.
Subject: a high‑level business domain used for organizing warehouse data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
