Why Layer Your Data Warehouse? Unlock Performance, Cost Savings, and Maintainability
This article explains the purpose and benefits of data‑warehouse layering, outlines the four ETL steps, describes each architectural layer from ODS to ADS, presents modeling principles, naming conventions, and includes sample DDL to illustrate how layered design improves data quality, reuse, and operational efficiency.
1. Why Layer a Data Warehouse?
Organizing data into clear layers improves structure, enables data‑lineage tracking, promotes reuse, simplifies problem solving, shields source‑system changes, and reduces storage and compute costs.
2. ETL Operations
ETL (Extraction, Transformation, Loading) extracts data from heterogeneous sources, cleans and transforms it, then loads it into the warehouse. The four steps are extraction, cleaning, transformation, and loading.
3. Data Warehouse Technical Architecture
A typical architecture includes ODS (Operational Data Store), DW (Data Warehouse), DWD (Detail), DWS (Summary), and ADS (Application). Each layer has specific responsibilities and storage strategies (incremental, full, or zipper).
4. Layer Details
ODS Layer
Stores raw data with minimal processing, often partitioned by day, and may include basic cleaning and schema alignment.
DW Layer
Divides into DIM (dimension), DWD (detail), and DWS (summary) using dimensional modeling to provide clean, consistent data for analysis.
DWD Layer
Performs data cleaning, mapping, and dimension degeneration, producing fine‑grained fact tables.
DWS Layer
Aggregates DWD data into wide tables by business themes, supporting OLAP queries and reporting.
ADS Layer
Provides highly aggregated data for applications, dashboards, and downstream services.
5. Modeling Principles
Use a star schema, declare grain, choose appropriate dimensions, ensure one fact per table, avoid mixed granularity, and apply dimension degeneration where beneficial.
6. Naming Conventions
Tables follow patterns such as dwd_{business}_{domain}_{process}_i for incremental tables and dws_{business}_{domain}_{grain}_1d for daily summaries.
7. Sample DDL
CREATE TABLE IF NOT EXISTS dwd_asale_trd_itm_di (
item_id BIGINT COMMENT '商品ID',
item_title STRING COMMENT '商品名称',
item_price DOUBLE COMMENT '商品价格',
item_stuff_status BIGINT COMMENT '商品新旧程度_0全新1闲置2二手',
item_prov STRING COMMENT '商品省份',
item_city STRING COMMENT '商品城市',
cate_id BIGINT COMMENT '商品类目ID',
cate_name STRING COMMENT '商品类目名称',
commodity_id BIGINT COMMENT '品类ID',
commodity_name STRING COMMENT '品类名称',
buyer_id BIGINT COMMENT '买家ID'
) COMMENT '交易商品信息事实表'
PARTITIONED BY (ds STRING COMMENT '日期')
LIFECYCLE 400;Images illustrate the layered architecture and data flow.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
