Mastering Data Warehouse Architecture: Layered Design, Naming, and Governance Standards
This comprehensive guide explains data‑warehouse construction standards, covering model architecture principles, layered design (ODS, DWD, DWM, DWS, APP), domain division, data‑model design rules, type and naming conventions, table lifecycle management, and practical implementation examples.
Data Model Architecture Principles
A reliable data‑warehouse requires a clear layered structure that isolates raw data, supports stable downstream consumption, and avoids overly long data pipelines. The optimal number of layers depends on business needs rather than arbitrary rules.
Clear data structure
Data lineage tracing
Reduced duplicate development
Organized data relationships
Isolation of raw data impact
The typical layering includes:
ODS (Operational Data Store) : Closest to source data; ingest raw records without heavy cleaning.
DW (Data Warehouse) : Core layer where thematic models are built. It is subdivided into:
APP (Application) : Data products for reporting, analytics, or downstream systems (e.g., ES, PostgreSQL, Redis, Hive, Druid).
Dimension Layer : Optional layer for high‑cardinality (user, product) and low‑cardinality (enumerations, dates) dimension tables.
1) Data Warehouse Layer Principles
The ODS layer feeds the DWD layer, which in turn feeds DWM and DWS before reaching the APP layer. Stable business follows ODS → DWD → DWM → DWS → APP; exploratory work may skip DWM.
2) Domain Division Principles
Domains can be defined by business processes (e.g., order, payment) or by abstract data domains that group related events and dimensions. Proper domain design ensures coverage of current needs while allowing seamless addition of new domains.
3) Data Model Design Principles
High cohesion, low coupling : Themes should be internally cohesive while remaining loosely coupled across themes.
Separate core and extension models : Core models cover common business fields; extensions add niche attributes without polluting the core.
Centralize common processing logic : Encapsulate reusable logic in lower layers to avoid duplication.
Cost‑performance balance : Controlled redundancy can improve query speed; avoid excessive duplication.
Rollback capability : Processing logic must be deterministic across runs.
Data Warehouse Common Development Standards
1) Layer Call Standards
Stable flows: ODS → DWD → DWS → APP. Exploratory flows may use ODS → DWD → APP or ODS → DWD → DWM → APP. Dependencies must flow forward; reverse dependencies (e.g., DWM depending on DWS) are prohibited.
2) Data Type Standards
Amount : double or decimal(28,6), specify unit (cents or yuan).
String : string ID fields : bigint Time : string (ISO format recommended)
Status :
string3) Data Redundancy Standards
Redundant fields must be high‑frequency and used by at least three downstream processes.
Redundancy should not cause excessive data latency.
Redundant field overlap with existing fields should stay below 60%.
4) NULL Field Handling
Dimension fields: set to -1 when missing.
Metric fields: set to 0 when missing.
5) Metric Definition Standards
Metrics must be consistent within a domain. The process includes:
Collecting atomic metrics (business line, process, domain, name, description, function).
System generates definition expressions and SQL.
Derived metrics are built from atomic metrics with additional dimensions or modifiers.
6) Table Lifecycle Management
Historical data is graded:
P0 : Irreplaceable core data (e.g., transactions, logs).
P1 : Important business and application data.
P2 : Recoverable intermediate ETL data.
P3 : Low‑importance data (e.g., auxiliary reports).
Table types and typical retention:
Incremental tables : Daily partitions, keep recent 14 days if a full table exists, otherwise permanent.
Full tables : Keep all data; may be partitioned daily.
Snapshot tables : Daily full snapshots, retain as needed.
Merge tables : Keep latest version per primary key, older versions in previous partitions.
ETL temporary tables : Retain up to 7 days, delete after use.
TT temporary data : Default 93‑day retention, adjustable.
Layer‑Specific Development Guidelines
ODS Layer
Each source table syncs only once.
Full and incremental sync logic must be explicit.
Partition by processing date and time.
Missing target fields are auto‑filled.
Table classification (full, mirror, incremental, ETL temp) dictates retention policies.
Data quality checks: unique keys, partition emptiness, enum monitoring, volume trend monitoring, table comments.
Public Dimension Layer
Design rules emphasize consistency (same name, type, content across physical tables), appropriate combination or splitting of dimensions, and storage/retention based on access frequency (e.g., keep recent 7‑400 days depending on span).
DWD Detail Layer
Store data by day; retention mirrors the general rules (7‑400 days based on access span).
DWS Aggregation Layer
Aggregation aims for query performance and result consistency. Steps:
Identify aggregation dimensions (e.g., product, time).
Determine aggregation granularity (daily, monthly, etc.).
Select facts to aggregate (e.g., amount vs. count).
Design principles include data reusability, avoiding cross‑domain aggregation, and clear naming of statistical periods.
Naming Conventions
Prefixes indicate layer: ods_, dwd_, dws_, dim_, dm_. Period/range codes: d (daily snapshot), i (incremental), f (full), w (weekly), l (link), a (non‑partitioned full).
Table naming patterns:
Regular tables: [layer]_[department]_[domain]_[topic]_[cycle|range] Intermediate tables: mid_[table]_[0-9|dim] Temporary tables: tmp_[name] Dimension tables: dim_[name] Manual tables: dwd_[domain]_manual_[name] Metric naming rules enforce lowercase, underscore separation, avoidance of SQL keywords, and suffixes such as _cnt for counts, _price for amounts, and time partition fields dt, hh, mi. Example: trade_amt for transaction amount, install_poi_cnt for installed store count.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
