Big Data 26 min read

Mastering Data Warehouse Architecture: Layered Design, Naming, and Governance Standards

This comprehensive guide explains data‑warehouse construction standards, covering model architecture principles, layered design (ODS, DWD, DWM, DWS, APP), domain division, data‑model design rules, type and naming conventions, table lifecycle management, and practical implementation examples.

dbaplus Community

Jan 11, 2022

Mastering Data Warehouse Architecture: Layered Design, Naming, and Governance Standards

Data Model Architecture Principles

A reliable data‑warehouse requires a clear layered structure that isolates raw data, supports stable downstream consumption, and avoids overly long data pipelines. The optimal number of layers depends on business needs rather than arbitrary rules.

Clear data structure

Data lineage tracing

Reduced duplicate development

Organized data relationships

Isolation of raw data impact

The typical layering includes:

ODS (Operational Data Store) : Closest to source data; ingest raw records without heavy cleaning.

DW (Data Warehouse) : Core layer where thematic models are built. It is subdivided into:

APP (Application) : Data products for reporting, analytics, or downstream systems (e.g., ES, PostgreSQL, Redis, Hive, Druid).

Dimension Layer : Optional layer for high‑cardinality (user, product) and low‑cardinality (enumerations, dates) dimension tables.

1) Data Warehouse Layer Principles

The ODS layer feeds the DWD layer, which in turn feeds DWM and DWS before reaching the APP layer. Stable business follows ODS → DWD → DWM → DWS → APP; exploratory work may skip DWM.

2) Domain Division Principles

Domains can be defined by business processes (e.g., order, payment) or by abstract data domains that group related events and dimensions. Proper domain design ensures coverage of current needs while allowing seamless addition of new domains.

3) Data Model Design Principles

High cohesion, low coupling : Themes should be internally cohesive while remaining loosely coupled across themes.

Separate core and extension models : Core models cover common business fields; extensions add niche attributes without polluting the core.

Centralize common processing logic : Encapsulate reusable logic in lower layers to avoid duplication.

Cost‑performance balance : Controlled redundancy can improve query speed; avoid excessive duplication.

Rollback capability : Processing logic must be deterministic across runs.

Data Warehouse Common Development Standards

1) Layer Call Standards

Stable flows: ODS → DWD → DWS → APP. Exploratory flows may use ODS → DWD → APP or ODS → DWD → DWM → APP. Dependencies must flow forward; reverse dependencies (e.g., DWM depending on DWS) are prohibited.

2) Data Type Standards

Amount : double or decimal(28,6), specify unit (cents or yuan).

String : string ID fields : bigint Time : string (ISO format recommended)

Status :

string

3) Data Redundancy Standards

Redundant fields must be high‑frequency and used by at least three downstream processes.

Redundancy should not cause excessive data latency.

Redundant field overlap with existing fields should stay below 60%.

4) NULL Field Handling

Dimension fields: set to -1 when missing.

Metric fields: set to 0 when missing.

5) Metric Definition Standards

Metrics must be consistent within a domain. The process includes:

Collecting atomic metrics (business line, process, domain, name, description, function).

System generates definition expressions and SQL.

Derived metrics are built from atomic metrics with additional dimensions or modifiers.

6) Table Lifecycle Management

Historical data is graded:

P0 : Irreplaceable core data (e.g., transactions, logs).

P1 : Important business and application data.

P2 : Recoverable intermediate ETL data.

P3 : Low‑importance data (e.g., auxiliary reports).

Table types and typical retention:

Incremental tables : Daily partitions, keep recent 14 days if a full table exists, otherwise permanent.

Full tables : Keep all data; may be partitioned daily.

Snapshot tables : Daily full snapshots, retain as needed.

Merge tables : Keep latest version per primary key, older versions in previous partitions.

ETL temporary tables : Retain up to 7 days, delete after use.

TT temporary data : Default 93‑day retention, adjustable.

Layer‑Specific Development Guidelines

ODS Layer

Each source table syncs only once.

Full and incremental sync logic must be explicit.

Partition by processing date and time.

Missing target fields are auto‑filled.

Table classification (full, mirror, incremental, ETL temp) dictates retention policies.

Data quality checks: unique keys, partition emptiness, enum monitoring, volume trend monitoring, table comments.

Public Dimension Layer

Design rules emphasize consistency (same name, type, content across physical tables), appropriate combination or splitting of dimensions, and storage/retention based on access frequency (e.g., keep recent 7‑400 days depending on span).

DWD Detail Layer

Store data by day; retention mirrors the general rules (7‑400 days based on access span).

DWS Aggregation Layer

Aggregation aims for query performance and result consistency. Steps:

Identify aggregation dimensions (e.g., product, time).

Determine aggregation granularity (daily, monthly, etc.).

Select facts to aggregate (e.g., amount vs. count).

Design principles include data reusability, avoiding cross‑domain aggregation, and clear naming of statistical periods.

Naming Conventions

Prefixes indicate layer: ods_, dwd_, dws_, dim_, dm_. Period/range codes: d (daily snapshot), i (incremental), f (full), w (weekly), l (link), a (non‑partitioned full).

Table naming patterns:

Regular tables: [layer]_[department]_[domain]_[topic]_[cycle|range] Intermediate tables: mid_[table]_[0-9|dim] Temporary tables: tmp_[name] Dimension tables: dim_[name] Manual tables: dwd_[domain]_manual_[name] Metric naming rules enforce lowercase, underscore separation, avoidance of SQL keywords, and suffixes such as _cnt for counts, _price for amounts, and time partition fields dt, hh, mi. Example: trade_amt for transaction amount, install_poi_cnt for installed store count.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data modeling layered architecture Naming Conventions Data Governance

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.