Big Data 8 min read

Mastering Data Model Architecture: Layered Design & Naming Best Practices

This article presents a comprehensive guide to data model architecture, detailing layered data store definitions, classification structures, processing flow, naming conventions, and core design principles to help engineers build scalable, maintainable data warehouses.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Mastering Data Model Architecture: Layered Design & Naming Best Practices

Introduction

This article introduces data model architecture specifications and clarifies that the non‑functional guidelines are advisory only, not mandatory for product features.

Data Layer Division

ODS (Operational Data Store) : Aligns closely with source system increments or full loads, serving as a data preparation area that records base data and historical changes, primarily feeding data into MaxCompute.

CDM (Common Data Model) : The public dimension model layer, further split into DWD and DWS. It consolidates data, builds consistent dimensions, and creates reusable detailed fact tables and aggregated public‑grain metric tables.

DWD (Data Warehouse Detail) : Detailed data layer.

DWS (Data Warehouse Summary) : Summary data layer.

ADS (Application Data Service) : Application data layer.

Specific repository layering should be determined by business, data, and system scenarios.

Data Classification Architecture

The ODS layer is divided into three parts: data preparation zone, offline data zone, and near‑real‑time data zone. After entering the CDM layer, the architecture consists of:

Public Dimension Layer : Establishes enterprise‑wide consistent dimensions based on dimensional modeling principles.

Detailed Fact Layer : Driven by business processes, builds the finest‑grain fact tables; important dimension attributes may be denormalized into wide tables.

Public Summary Fact Layer : Driven by analytical subjects, creates aggregated metric fact tables using wide‑table techniques.

Data Processing Flow Architecture

Data Partitioning and Naming Conventions

Names should reflect business, data domain, and business process contexts, using clear English abbreviations to guide project, table, and field naming.

By Business : Use business‑level abbreviations (e.g., Alibaba’s Taobao → "tb").

By Data Domain : Use domain‑level abbreviations (e.g., "transaction" → "trd").

By Business Process : When a domain contains multiple processes, name according to the process (e.g., refund process in transaction domain → "rfd_ent").

Data Model Overview

A data model abstracts reality to help understand the objective world. It defines relationships and structures, enabling systematic data retrieval. Good models improve storage efficiency, query performance, and data consistency.

Core Design Principles

High Cohesion & Low Coupling

Group related data with similar granularity into the same logical or physical model, and separate data that are rarely accessed together.

Separate Core and Extension Models

Core models contain fields for common business needs; extension models hold personalized or low‑frequency fields, without letting extensions overly intrude on core simplicity.

Common Processing Logic Consolidation

Encapsulate shared logic in the underlying data scheduling layer, avoiding exposure to the application layer and preventing duplication.

Cost‑Performance Balance

Moderate data redundancy can improve query and refresh performance, but excessive duplication should be avoided.

Data Rollback Capability

Processing logic must be deterministic so that repeated runs at different times yield identical results.

Consistency

Identical fields across tables must share the same name.

Clear, Understandable Naming

Table names should be consistent, intuitive, and easy for downstream users to comprehend.

Supplementary Notes

A single model cannot satisfy all requirements; choose modeling approaches wisely.

Typical design sequence: Conceptual Model → Logical Model → Physical Model.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datadata modelingData Warehousebest practicesnaming conventionsData Architecture
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.