Big Data 9 min read

Data Model Architecture Specification and Layered Design Guidelines

This article provides a comprehensive guide to data model architecture, detailing layered data storage concepts, classification structures, naming conventions, and design principles to help practitioners build consistent, maintainable, and performant data warehouses and analytics platforms.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Data Model Architecture Specification and Layered Design Guidelines

This article introduces data model architecture specifications.

Declaration: The non-functional specifications presented in this article and subsequent sections are advisory; product features are not mandatory and are for guidance only.

Data Layer Division

ODS: Operational Data Store, the operational data layer that mirrors source system data (incremental or full) and serves as a data preparation area, recording base data and historical changes, primarily to ingest data into MaxCompute.

CDM: Common Data Model, the public dimension model layer, further divided into DWD and DWS. It handles data processing and integration, builds consistent dimensions, and creates reusable detailed fact tables and aggregated public‑grain metric tables for analysis.

DWD: Data Warehouse Detail, the detailed data layer.

DWS: Data Warehouse Summary, the summary data layer.

ADS: Application Data Service, the application data layer.

The specific warehouse layering should be considered in conjunction with business, data, and system scenarios.

Data Classification Architecture

The classification architecture at the ODS layer is divided into three parts: data preparation area, offline data, and near‑real‑time data area. After entering the CDM layer, it consists of:

Public Dimension Layer: Establishes enterprise‑wide consistent dimensions based on dimensional modeling principles.

Detailed Fact Layer: Driven by business processes, builds the most granular fact tables; important dimension attributes may be denormalized into wide tables as needed.

Public Summary Fact Layer: Driven by analytical subject objects, builds aggregated metric fact tables based on application and product indicator requirements, often materialized as wide tables.

Data Processing Flow Architecture

Data Processing Flow Diagram
Data Processing Flow Diagram

Data Partitioning and Namespace Conventions

Partition data according to business and define English abbreviations for naming; this serves as a reference for project, table, and field naming during data development.

By Business: Use primary business abbreviations (e.g., Alibaba’s Taobao → "tb") to guide physical model partitioning and ODS project naming.

By Data Domain: Use CDM layer domain abbreviations (e.g., "trd" for transaction data).

By Business Process: When a data domain comprises multiple processes, name according to the process (e.g., "rfd_ent" for refund process in the transaction domain).

Data Model

A model reflects and abstracts reality, helping us understand the objective world. Data models define relationships and structures, enabling systematic data retrieval. For example, supermarket product layouts follow consumer purchase habits and traffic flow.

Role of Data Model

Data modeling is the first step after business requirement analysis in data warehouse projects; a good model improves storage, retrieval efficiency, and data consistency.

Basic Principles of Model Design

High Cohesion and Low Coupling

Design logical and physical models so that closely related data are grouped together, and data accessed together are stored together, while unrelated data are separated.

Core and Extension Model Separation

Maintain a core model for common business fields and an extension model for specialized or low‑frequency needs, avoiding excessive intrusion of extension fields into the core.

Common Processing Logic Consolidation

Encapsulate shared processing logic in the underlying data scheduling layer rather than exposing it to the application layer.

Cost and Performance Balance

Appropriate data redundancy can improve query and refresh performance, but excessive duplication should be avoided.

Data Rollback Capability

Processing logic should be idempotent; repeated runs at different times must yield consistent results.

Consistency

Identical fields across tables must share the same name.

Clear and Understandable Naming

Table naming conventions should be clear, consistent, and easily understood by downstream users.

Supplementary Notes

A single model cannot satisfy all requirements.

Select modeling methods reasonably.

Typical design sequence: Conceptual Model → Logical Model → Physical Model.

Related articles:

What Is Data Quality?

Business and Management Set Upper Limits, Technology Sets Lower Limits

What Is Investigated in Background Checks?

Big Data Technology and Architecture – 2021 Interview Series Summary

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data modelingnaming conventionsLayered Design
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.