Fundamentals 8 min read

Mastering Data Model Architecture: Layered Design and Best Practices

This article outlines a comprehensive data model architecture, detailing layered data stores, classification structures, naming conventions, and core modeling principles to guide effective data warehouse design and implementation.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Mastering Data Model Architecture: Layered Design and Best Practices

This article introduces the data model architecture specifications.

Declaration

The non-functional specifications in this article and subsequent sections are advisory; product functionality is not mandatory and is provided for guidance only.

Data Layer Division

ODS: Operational Data Store, a data preparation area that mirrors source system incremental or full data, records basic data and historical changes, and serves as the entry point to MaxCompute.

CDM: Common Data Model, subdivided into DWD and DWS, responsible for data processing and integration, establishing consistent dimensions, and building reusable detailed fact tables and aggregated public metrics.

DWD: Data Warehouse Detail, the detailed data layer.

DWS: Data Warehouse Summary, the aggregated data layer.

ADS: Application Data Service, the application data layer.

The specific layering of warehouses should be considered in combination with business, data, and system scenarios.

Data Classification Architecture

In the ODS layer, the classification architecture is divided into three parts: data preparation area, offline data, and near‑real‑time data area. After entering the CDM layer, it consists of:

Public Dimension Layer: Establishes enterprise‑wide consistent dimensions based on dimensional modeling principles.

Detailed Fact Layer: Driven by business processes, builds the most granular fact tables; important dimension attributes may be denormalized into wide tables as needed.

Public Summary Fact Layer: Driven by analytical subjects, constructs aggregated metric fact tables based on application and product indicator requirements, using wide‑table techniques.

Data Processing Flow Architecture

Data Partition and Naming Conventions

Define data partitions and naming conventions based on business, using English abbreviations that combine business names with data layer identifiers to serve as references for project spaces, tables, and fields during data development.

By Business: Name according to the primary business to guide physical model partitioning and ODS project naming. Example: Alibaba’s “Taobao” can be abbreviated as “tb”.

By Data Domain: Name according to CDM layer data domains for effective data management and table naming. Example: “transaction” can be abbreviated as “trd”.

By Business Process: When a data domain comprises multiple business processes, name according to the process. Example: the “refund” process in the transaction domain can be abbreviated as “rfd_ent”.

Data Model

A data model reflects and abstracts real‑world entities, helping us understand the objective world. It defines relationships and structures, enabling systematic data retrieval. For instance, supermarket product placement follows consumer habits and traffic flow.

Data modeling is the first step after business requirement analysis when building a data warehouse. A good model improves data storage, retrieval efficiency, and ensures data consistency.

Model Design Principles

High Cohesion and Low Coupling: Group closely related data and fields together, separating data that are rarely accessed together.

Core and Extension Model Separation: Core models contain fields for common business needs; extension models hold specialized or low‑frequency fields without contaminating the core.

Common Processing Logic Consolidation: Encapsulate shared logic in the underlying data scheduling layer, avoiding exposure to the application layer and duplication.

Cost‑Performance Balance: Moderate data redundancy can improve query and refresh performance, but excessive duplication should be avoided.

Data Rollback Capability: Processing logic must produce deterministic results when run multiple times at different moments.

Consistency: Identical fields must retain the same name across different tables.

Clear Naming: Table names should be clear, consistent, and easily understood by downstream users.

Supplementary Notes

A single model cannot satisfy all requirements.

Select the appropriate modeling approach based on the scenario.

Typical design sequence: Conceptual Model → Logical Model → Physical Model.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data modelingnaming conventionsData GovernanceData Architecture
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.