Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions
This comprehensive guide explains data‑warehouse construction standards, covering model architecture principles, public development rules, layer‑by‑layer design specifications, and systematic naming conventions for tables, dimensions, and metrics to ensure consistency, scalability, and reliable data governance.
Data Model Architecture Principles
A reliable data warehouse should use a clear layered structure that isolates raw data from downstream impact and avoids overly long pipelines. The number of layers is chosen based on business needs; there is no universal “best” number.
Benefits of a well‑designed layering architecture include:
Clear data structure
Traceable data lineage
Reduced duplicate development
Organized data relationships
Isolation of raw‑data impact
ODS Layer (Operational Data Store)
The ODS is the closest to the source system. Data is ingested with minimal cleaning; de‑duplication, noise removal, and anomaly handling are deferred to downstream layers.
DW Layer (Data Warehouse)
The DW builds business‑oriented models from ODS data and is divided into three sub‑layers:
DWD (Detail) – retains ODS granularity, performs cleaning, integration, and standardization; may degenerate dimensions into fact tables to reduce joins.
DWM (Middle) – performs light aggregation on DWD data to create reusable intermediate tables and common metrics.
DWS (Service) – provides a lightly aggregated, wide‑table view for analysis, covering roughly 80% of use cases.
Application Layer (APP)
Provides data for downstream products and analytics, typically stored in systems such as Elasticsearch, PostgreSQL, Redis, Hive, or Druid.
Dimension Layer
High‑cardinality dimensions (e.g., user or product tables) may be stored separately; low‑cardinality dimensions (e.g., enums, dates) are small lookup tables.
Data Warehouse Public Development Standards
Layer Call Hierarchy
Stable business flows follow ODS → DWD → DWS → APP. Exploratory or unstable needs may use ODS → DWD → APP or ODS → DWD → DWM → APP.
Normal flow: ODS → DWD → DWM → DWS → APP. Direct ODS → DWS indicates an incomplete domain.
Avoid DWS tables that also reference DWM tables from the same domain.
Minimize DWM‑generated tables within a domain to protect ETL efficiency.
ODS tables may only be referenced by DWD; direct use in DWM/DWS/APP is prohibited.
Prevent reverse dependencies (e.g., DWM depending on DWS).
Data Type Standards
Amount: double or decimal(28,6) with explicit unit.
String: string ID fields: bigint Time: string Status:
stringRedundancy Standards
Redundant fields must be used by at least three downstream tables.
Redundancy should not cause significant latency.
Redundant field overlap with existing fields should stay below 60%.
NULL Handling
Dimension fields: set to -1 Metric fields: set to
0Metric Definition Standards
Metrics must have consistent definitions within a domain to avoid ambiguity. The process includes metric collection, naming, and SQL generation.
Table Processing Standards
Incremental tables record only new rows per partition (usually daily).
Full tables contain the latest snapshot of all rows, reported each day.
Snapshot tables store a full daily snapshot for historical queries.
Merge tables keep the latest version per primary key while preserving history.
ETL temporary tables are retained up to 7 days then dropped.
TT temporary data (e.g., from DbSync) defaults to a 93‑day retention.
Table Lifecycle Management
Historical data is classified into four grades (P0‑P3) based on importance and recoverability, guiding retention periods. Table types (event streams, event mirrors, dimension tables, merge tables, etc.) are mapped to appropriate layers and retention policies.
Data Warehouse Layer Development Standards
ODS Layer Design
Each source table syncs only once.
Separate full‑init and incremental logic.
Partition by statistical date and time.
Missing target fields are auto‑filled.
Public Dimension Layer Design
Enforce consistency of field names, types, and content across physical tables. Combine highly related fields; split or duplicate dimensions based on usage frequency and importance.
DWD Detail Layer
Store data at ODS granularity with daily partitions. Retention recommendations mirror those of the ODS layer (e.g., 7‑400 days based on access span).
Transactional Fact Table Guidelines
Partition by event date/time.
Include redundant subsets to reduce I/O.
Degenerate dimensions into fact tables to avoid costly joins.
Snapshot Fact Tables
Aggregate events over a fixed period (day, week, month) into a single row; granularity is period‑based rather than per‑transaction.
Cumulative Snapshot Fact Tables
Combine multiple business processes for analysis (e.g., purchase flow) and capture interval metrics.
Data Warehouse Naming Standards
Root Design
Root words (e.g., rack for shelf, rate for ratio) unify table, field, and domain names, improving metadata clarity.
Table Naming Convention
General format:
[layer]_[department]_[business_domain]_[subject]_[description]_[cycle|range]. Examples:
Regular table: dwd_sales_trade_amt_i Intermediate table: mid_table_name_dim Temporary table: tmp_xxx Dimension table: dim_xxx Manual table:
dwd_business_manual_xxxMetric Naming Convention
All lowercase, words separated by underscores.
Avoid SQL keywords; append _col if needed.
Quantity suffix _cnt, amount suffix _price.
Date partition field dt (format yyyymmdd or yyyy‑mm‑dd).
Hour field hh, minute field mi.
Boolean flags prefixed with is_ and must be non‑null.
Metric names combine business modifiers, date modifiers, aggregation modifiers, and base metric roots (e.g., trade_amt, install_poi_cnt).
References
"Big Data Road: Alibaba’s Practice"
"Data Warehouse Toolbox: Dimensional Modeling Guide"
"OneData Construction: Meituan SaaS Data Warehouse"
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
