Big Data 28 min read

Master Data Warehouse Architecture: Layers, Naming Rules, and Lifecycle Tips

This guide outlines a comprehensive data model architecture for data warehouses, detailing layer definitions (ODS, CDM, DWD, DWS, ADS), naming conventions, data type standards, partitioning strategies, redundancy rules, and lifecycle management policies to ensure consistency, performance, and maintainability across big‑data environments.

dbaplus Community

Sep 10, 2023

Master Data Warehouse Architecture: Layers, Naming Rules, and Lifecycle Tips

1. Data Model Architecture Specification

The architecture is divided into five logical layers:

ODS (Operational Data Store) : Mirrors source system data, serves as a preparation zone for the DMP.

CDM (Common Data Model) : Includes DWD (detail) and DWS (summary) layers, responsible for data integration, dimension consistency, and reusable analytical fact tables.

DWD (Data Warehouse Detail) : Stores fine‑grained detail data.

DWS (Data Warehouse Summary) : Stores aggregated summary data.

ADS (Application Data Service) : The application‑level data layer.

2. Data Division and Namespace Convention

Names should combine business abbreviations with layer identifiers, providing clear references for projects, tables, and fields. Recommendations include naming by business, by data domain, and by business process (e.g., user_login for a login event).

3. Data Model Principles

High cohesion & low coupling : Group related data together and separate unrelated data.

Separate core and extension models : Core fields support common business; extensions handle specialized needs without polluting the core.

Encapsulate common logic in lower layers : Avoid exposing shared logic to the application layer.

Balance cost and performance : Limited redundancy can improve query speed, but excessive duplication should be avoided.

Rollback‑safe processing : Re‑running jobs must produce identical results.

Consistency : Identical fields across tables must share the same name.

Clear, understandable naming : Table names should be self‑explanatory for downstream users.

4. Public Standards

4.1 Layer Call Convention

Application layers must call data from the DW public layer, never bypass the middle layer to read directly from ODS. ODS data should be accessed via CDM views, and CDM tasks should be limited to ten layers deep. Typically, one task produces one output table.

4.2 Data Type Specification

ODS inherits source system types (e.g., MySQL conversion rules). CDM adopts ODS types for inherited fields. Derived fields follow these rules:

Monetary and decimal values → DOUBLE String values → STRING ID and integer values → BIGINT Time values → STRING (or DATETIME if required)

Status fields →

STRING

4.3 Public Field Definition

Standard partition fields:

Daily partition: dt(YYYYMMDD) Hourly partition: hh(00-23) Minute partition: mi(00-59) Boolean flag: is_{business} with values Y / N (no nulls)

4.4 Data Redundancy

Redundant fields should be frequently accessed (≥3 downstream SQLs) and must not cause significant refresh latency. Duplicate fields with >60% overlap are discouraged; use joins instead.

4.5 Data Splitting

Horizontal and vertical splitting is based on access hot‑spots and sparsity of values. Core vs. extension fields are split vertically; high‑frequency columns stay together; low‑frequency columns are separated. Horizontal splitting can use secondary partitions to avoid excessive data copying.

4.6 Null‑Value Handling

Summary metrics: replace nulls with zero.

Dimension attributes: use -99 to indicate unknown.

5. Data Storage and Lifecycle Management

Retention policies are based on the maximum access span within three months:

≤4 days → keep 7 days

≤12 days → keep 15 days

≤30 days → keep 33 days

≤90 days → keep 93 days

≤180 days → keep 183 days

≤365 days → keep 368 days

6. CDM Public Dimension Layer Design

6.1 Design Principles

Maintain consistent field names, types, and content across all physical tables. Exceptions are allowed for aliasing when different business roles require distinct names.

6.2 Dimension Combination & Splitting

Combine highly related attributes in one dimension table.

Separate unrelated attributes into miscellaneous dimensions or keep them in fact tables.

Behavioral dimensions are derived from aggregated metrics and can be redundantly stored in dimension tables.

6.3 Naming Convention

{project}.dim{business/pub}{dimension}[_{custom}]

7. CDM Detail Layer Design

7.1 Naming Convention

{project}.dwd{business}{domain}{process}[{custom}]{refresh}{i/f}

where i = incremental, f = full.

7.2 Storage & Lifecycle

Detail tables are fact tables stored with daily partitions. Transactional fact tables are retained permanently; snapshot fact tables follow the same retention rules as in section 5.

7.3 Transactional Fact Table Guidelines

Design around specific business events.

Use event timestamp as partition field.

Redundant subsets reduce I/O for upstream queries.

Degenerate dimensions into the fact table to avoid costly joins.

7.4 Snapshot Fact Tables

Used for state or stock analysis; sampled at regular intervals.

7.5 Cumulative Snapshot Fact Tables

Combine multiple business processes (e.g., purchase order flow) to analyze intervals such as payment‑to‑shipment time.

8. CDM Summary Layer Design

8.1 Naming Convention

{project}.dws{business}{domain}{granularity}[{custom}]{period}{i/f}

8.2 Storage & Lifecycle

Summary tables are fact tables with daily partitions, following the same retention policy as section 5.

9. Table Design Standards

Good table design reduces storage, computation, and maintenance costs. Key goals:

Lower storage cost by minimizing redundant data.

Lower compute cost by enabling partition pruning and reducing unnecessary reads.

Lower maintenance complexity through consistent naming and clear partitioning.

9.1 Design Steps

Identify project space and business process to decide table type.

Define table description, permissions, and owner.

Choose partitioned vs. non‑partitioned based on data volume and integration patterns.

Define fields and partition columns.

Create/convert the table.

Clarify data import scenarios (batch, streaming, incremental).

Set table and partition lifecycle policies.

9.2 Partition Design Rules

Support adding new partitions but not new partition columns.

Maximum of 60 000 partitions per table.

Multi‑level partitions require all level values to be specified on addition.

Do not rename partition columns; only values can change.

9.3 Partition Field Selection

Prefer time‑based fields (day, hour, minute) and high‑cardinality fields used frequently in WHERE, GROUP BY, or DISTINCT. Avoid fields with >10 000 distinct values that are rarely queried.

select … from ** table where id = 123 and ** …;

10. Data Import Channels and Table Design

Supported channels include DataHub, DataX, DTS, Console upload, and SDK INSERT INTO statements. All channels must respect a commit size of roughly 64 MB to avoid small‑file issues.

11. Partition‑to‑Storage Mapping

Each primary partition contains secondary time‑based partitions; each secondary partition stores all columns. Guidelines include limiting the number of partitions, avoiding tiny partitions, and ensuring partition choices facilitate efficient query pruning.

12. General Table & Partition Naming Rules

Unified naming must indicate business type, table purpose (fact, dimension, log, etc.), and entity.

Fields with identical business meaning must share the same data type.

Partition design should prioritize query patterns, avoid skew, and keep partition counts reasonable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Warehouse naming conventions Partitioning

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.