Big Data 36 min read

Why Layer Your Data Warehouse? Unlock Performance, Cost Savings, and Maintainability

This article explains the purpose and benefits of data‑warehouse layering, outlines the four ETL steps, describes each architectural layer from ODS to ADS, presents modeling principles, naming conventions, and includes sample DDL to illustrate how layered design improves data quality, reuse, and operational efficiency.

Data Thinking Notes

Dec 8, 2022

Why Layer Your Data Warehouse? Unlock Performance, Cost Savings, and Maintainability

1. Why Layer a Data Warehouse?

Organizing data into clear layers improves structure, enables data‑lineage tracking, promotes reuse, simplifies problem solving, shields source‑system changes, and reduces storage and compute costs.

2. ETL Operations

ETL (Extraction, Transformation, Loading) extracts data from heterogeneous sources, cleans and transforms it, then loads it into the warehouse. The four steps are extraction, cleaning, transformation, and loading.

3. Data Warehouse Technical Architecture

A typical architecture includes ODS (Operational Data Store), DW (Data Warehouse), DWD (Detail), DWS (Summary), and ADS (Application). Each layer has specific responsibilities and storage strategies (incremental, full, or zipper).

4. Layer Details

ODS Layer

Stores raw data with minimal processing, often partitioned by day, and may include basic cleaning and schema alignment.

DW Layer

Divides into DIM (dimension), DWD (detail), and DWS (summary) using dimensional modeling to provide clean, consistent data for analysis.

DWD Layer

Performs data cleaning, mapping, and dimension degeneration, producing fine‑grained fact tables.

DWS Layer

Aggregates DWD data into wide tables by business themes, supporting OLAP queries and reporting.

ADS Layer

Provides highly aggregated data for applications, dashboards, and downstream services.

5. Modeling Principles

Use a star schema, declare grain, choose appropriate dimensions, ensure one fact per table, avoid mixed granularity, and apply dimension degeneration where beneficial.

6. Naming Conventions

Tables follow patterns such as dwd_{business}_{domain}_{process}_i for incremental tables and dws_{business}_{domain}_{grain}_1d for daily summaries.

7. Sample DDL

CREATE TABLE IF NOT EXISTS dwd_asale_trd_itm_di (
  item_id BIGINT COMMENT '商品ID',
  item_title STRING COMMENT '商品名称',
  item_price DOUBLE COMMENT '商品价格',
  item_stuff_status BIGINT COMMENT '商品新旧程度_0全新1闲置2二手',
  item_prov STRING COMMENT '商品省份',
  item_city STRING COMMENT '商品城市',
  cate_id BIGINT COMMENT '商品类目ID',
  cate_name STRING COMMENT '商品类目名称',
  commodity_id BIGINT COMMENT '品类ID',
  commodity_name STRING COMMENT '品类名称',
  buyer_id BIGINT COMMENT '买家ID'
) COMMENT '交易商品信息事实表'
PARTITIONED BY (ds STRING COMMENT '日期')
LIFECYCLE 400;

Images illustrate the layered architecture and data flow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Warehouse layered architecture ETL dimensional modeling

Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.