Big Data 37 min read

Why Data Warehouse Modeling and Layered Architecture Matter

Data warehouse modeling organizes data into layered structures—ODS, DWD, DWS, and ADS—to improve performance, reduce costs, ensure data quality, enable traceability, simplify maintenance, and support both batch and real‑time analytics, while outlining best practices for ETL processes and schema design.

Architecture Digest
Architecture Digest
Architecture Digest
Why Data Warehouse Modeling and Layered Architecture Matter

The article explains the significance of data warehouse (DW) modeling and why a multi‑layer architecture is essential for high‑performance, low‑cost, and high‑quality data usage.

Layering benefits include clear data structures, data lineage tracking, reuse of data, simplification of complex problems, shielding downstream systems from source changes, and improved maintainability.

ETL operations are broken down into four steps: extraction (initial load and refresh), data cleaning, transformation (standardizing formats, enriching data, handling dimensions), and loading into target storage such as Hive, HBase, MySQL, or Elasticsearch.

The architecture is divided into several layers:

ODS (Operational Data Store) : Stores raw data with minimal processing, preserving source schema and supporting incremental or full loads.

DW (Data Warehouse) layer : Further processes ODS data into dimensional models (DIM), detail layer (DWD), and summary layer (DWS). It separates business processes (facts) from dimensions and provides historical data for BI.

DWD (Data Warehouse Detail) : Cleans, normalizes, and de‑duplicates data, applies dimension degeneration, and ensures data quality at the finest granularity.

DWS (Data Warehouse Service) : Aggregates DWD data into wide tables per business theme (e.g., user, product, finance) to support OLAP queries and reporting.

ADS (Application Data Service) : Provides highly aggregated or specialized datasets for applications, dashboards, and downstream services.

Key naming conventions are presented for each layer, e.g., dwd_{business}_detail_{date} for detail tables and dws_{business}_{metric}_{period} for summary tables. Example DDL for a transaction item fact table is shown:

CREATE TABLE IF NOT EXISTS dwd_asale_trd_itm_di (
  item_id BIGINT COMMENT '商品ID',
  item_title STRING COMMENT '商品名称',
  item_price DOUBLE COMMENT '商品价格',
  item_stuff_status BIGINT COMMENT '商品新旧程度_0全新1闲置2二手',
  item_prov STRING COMMENT '商品省份',
  item_city STRING COMMENT '商品城市',
  cate_id BIGINT COMMENT '商品类目ID',
  cate_name STRING COMMENT '商品类目名称',
  commodity_id BIGINT COMMENT '品类ID',
  commodity_name STRING COMMENT '品类名称',
  buyer_id BIGINT COMMENT '买家ID'
) COMMENT '交易商品信息事实表' PARTITIONED BY (ds STRING COMMENT '日期') LIFECYCLE 400;

Another example for a daily user interaction table in the ADS layer is provided:

CREATE TABLE app_usr_interact (
  user_id STRING COMMENT '用户id',
  nickname STRING COMMENT '用户昵称',
  register_date STRING COMMENT '注册日期',
  register_from STRING COMMENT '注册来源',
  remark STRING COMMENT '细分渠道',
  province STRING COMMENT '注册省份',
  pl_cnt BIGINT COMMENT '评论次数',
  ds_cnt BIGINT COMMENT '打赏次数',
  sc_add BIGINT COMMENT '添加收藏',
  sc_cancel BIGINT COMMENT '取消收藏',
  ...
) PARTITIONED BY (dt STRING) COMMENT '每日购买行为';

The article also outlines layer calling rules to avoid circular dependencies: ODS → DWD → DWS → ADS, and ODS → DWD → ADS. It emphasizes that each higher layer should only consume data from the immediate lower layer.

Overall, the piece serves as a comprehensive guide for designing, implementing, and managing a data warehouse ecosystem, covering architectural principles, ETL processes, schema design, naming standards, and operational best practices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLModelinglayered architectureETL
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.