Databases 26 min read

Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions

This comprehensive guide explains data‑warehouse construction standards, covering model architecture principles, public development rules, layer‑by‑layer design specifications, and systematic naming conventions for tables, dimensions, and metrics to ensure consistency, scalability, and reliable data governance.

ITPUB
ITPUB
ITPUB
Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions

Data Model Architecture Principles

A reliable data warehouse should use a clear layered structure that isolates raw data from downstream impact and avoids overly long pipelines. The number of layers is chosen based on business needs; there is no universal “best” number.

Benefits of a well‑designed layering architecture include:

Clear data structure

Traceable data lineage

Reduced duplicate development

Organized data relationships

Isolation of raw‑data impact

ODS Layer (Operational Data Store)

The ODS is the closest to the source system. Data is ingested with minimal cleaning; de‑duplication, noise removal, and anomaly handling are deferred to downstream layers.

DW Layer (Data Warehouse)

The DW builds business‑oriented models from ODS data and is divided into three sub‑layers:

DWD (Detail) – retains ODS granularity, performs cleaning, integration, and standardization; may degenerate dimensions into fact tables to reduce joins.

DWM (Middle) – performs light aggregation on DWD data to create reusable intermediate tables and common metrics.

DWS (Service) – provides a lightly aggregated, wide‑table view for analysis, covering roughly 80% of use cases.

Application Layer (APP)

Provides data for downstream products and analytics, typically stored in systems such as Elasticsearch, PostgreSQL, Redis, Hive, or Druid.

Dimension Layer

High‑cardinality dimensions (e.g., user or product tables) may be stored separately; low‑cardinality dimensions (e.g., enums, dates) are small lookup tables.

Data Warehouse Public Development Standards

Layer Call Hierarchy

Stable business flows follow ODS → DWD → DWS → APP. Exploratory or unstable needs may use ODS → DWD → APP or ODS → DWD → DWM → APP.

Normal flow: ODS → DWD → DWM → DWS → APP. Direct ODS → DWS indicates an incomplete domain.

Avoid DWS tables that also reference DWM tables from the same domain.

Minimize DWM‑generated tables within a domain to protect ETL efficiency.

ODS tables may only be referenced by DWD; direct use in DWM/DWS/APP is prohibited.

Prevent reverse dependencies (e.g., DWM depending on DWS).

Data Type Standards

Amount: double or decimal(28,6) with explicit unit.

String: string ID fields: bigint Time: string Status:

string

Redundancy Standards

Redundant fields must be used by at least three downstream tables.

Redundancy should not cause significant latency.

Redundant field overlap with existing fields should stay below 60%.

NULL Handling

Dimension fields: set to -1 Metric fields: set to

0

Metric Definition Standards

Metrics must have consistent definitions within a domain to avoid ambiguity. The process includes metric collection, naming, and SQL generation.

Table Processing Standards

Incremental tables record only new rows per partition (usually daily).

Full tables contain the latest snapshot of all rows, reported each day.

Snapshot tables store a full daily snapshot for historical queries.

Merge tables keep the latest version per primary key while preserving history.

ETL temporary tables are retained up to 7 days then dropped.

TT temporary data (e.g., from DbSync) defaults to a 93‑day retention.

Table Lifecycle Management

Historical data is classified into four grades (P0‑P3) based on importance and recoverability, guiding retention periods. Table types (event streams, event mirrors, dimension tables, merge tables, etc.) are mapped to appropriate layers and retention policies.

Data Warehouse Layer Development Standards

ODS Layer Design

Each source table syncs only once.

Separate full‑init and incremental logic.

Partition by statistical date and time.

Missing target fields are auto‑filled.

Public Dimension Layer Design

Enforce consistency of field names, types, and content across physical tables. Combine highly related fields; split or duplicate dimensions based on usage frequency and importance.

DWD Detail Layer

Store data at ODS granularity with daily partitions. Retention recommendations mirror those of the ODS layer (e.g., 7‑400 days based on access span).

Transactional Fact Table Guidelines

Partition by event date/time.

Include redundant subsets to reduce I/O.

Degenerate dimensions into fact tables to avoid costly joins.

Snapshot Fact Tables

Aggregate events over a fixed period (day, week, month) into a single row; granularity is period‑based rather than per‑transaction.

Cumulative Snapshot Fact Tables

Combine multiple business processes for analysis (e.g., purchase flow) and capture interval metrics.

Data Warehouse Naming Standards

Root Design

Root words (e.g., rack for shelf, rate for ratio) unify table, field, and domain names, improving metadata clarity.

Table Naming Convention

General format:

[layer]_[department]_[business_domain]_[subject]_[description]_[cycle|range]

. Examples:

Regular table: dwd_sales_trade_amt_i Intermediate table: mid_table_name_dim Temporary table: tmp_xxx Dimension table: dim_xxx Manual table:

dwd_business_manual_xxx

Metric Naming Convention

All lowercase, words separated by underscores.

Avoid SQL keywords; append _col if needed.

Quantity suffix _cnt, amount suffix _price.

Date partition field dt (format yyyymmdd or yyyy‑mm‑dd).

Hour field hh, minute field mi.

Boolean flags prefixed with is_ and must be non‑null.

Metric names combine business modifiers, date modifiers, aggregation modifiers, and base metric roots (e.g., trade_amt, install_poi_cnt).

References

"Big Data Road: Alibaba’s Practice"

"Data Warehouse Toolbox: Dimensional Modeling Guide"

"OneData Construction: Meituan SaaS Data Warehouse"

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataModelingData WarehouseETLnaming conventionsDatabase Standardslayer design
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.