Mastering Data Model Architecture: Layered Design & Naming Best Practices
This article presents a comprehensive guide to data model architecture, detailing layered data store definitions, classification structures, processing flow, naming conventions, and core design principles to help engineers build scalable, maintainable data warehouses.
Introduction
This article introduces data model architecture specifications and clarifies that the non‑functional guidelines are advisory only, not mandatory for product features.
Data Layer Division
ODS (Operational Data Store) : Aligns closely with source system increments or full loads, serving as a data preparation area that records base data and historical changes, primarily feeding data into MaxCompute.
CDM (Common Data Model) : The public dimension model layer, further split into DWD and DWS. It consolidates data, builds consistent dimensions, and creates reusable detailed fact tables and aggregated public‑grain metric tables.
DWD (Data Warehouse Detail) : Detailed data layer.
DWS (Data Warehouse Summary) : Summary data layer.
ADS (Application Data Service) : Application data layer.
Specific repository layering should be determined by business, data, and system scenarios.
Data Classification Architecture
The ODS layer is divided into three parts: data preparation zone, offline data zone, and near‑real‑time data zone. After entering the CDM layer, the architecture consists of:
Public Dimension Layer : Establishes enterprise‑wide consistent dimensions based on dimensional modeling principles.
Detailed Fact Layer : Driven by business processes, builds the finest‑grain fact tables; important dimension attributes may be denormalized into wide tables.
Public Summary Fact Layer : Driven by analytical subjects, creates aggregated metric fact tables using wide‑table techniques.
Data Processing Flow Architecture
Data Partitioning and Naming Conventions
Names should reflect business, data domain, and business process contexts, using clear English abbreviations to guide project, table, and field naming.
By Business : Use business‑level abbreviations (e.g., Alibaba’s Taobao → "tb").
By Data Domain : Use domain‑level abbreviations (e.g., "transaction" → "trd").
By Business Process : When a domain contains multiple processes, name according to the process (e.g., refund process in transaction domain → "rfd_ent").
Data Model Overview
A data model abstracts reality to help understand the objective world. It defines relationships and structures, enabling systematic data retrieval. Good models improve storage efficiency, query performance, and data consistency.
Core Design Principles
High Cohesion & Low Coupling
Group related data with similar granularity into the same logical or physical model, and separate data that are rarely accessed together.
Separate Core and Extension Models
Core models contain fields for common business needs; extension models hold personalized or low‑frequency fields, without letting extensions overly intrude on core simplicity.
Common Processing Logic Consolidation
Encapsulate shared logic in the underlying data scheduling layer, avoiding exposure to the application layer and preventing duplication.
Cost‑Performance Balance
Moderate data redundancy can improve query and refresh performance, but excessive duplication should be avoided.
Data Rollback Capability
Processing logic must be deterministic so that repeated runs at different times yield identical results.
Consistency
Identical fields across tables must share the same name.
Clear, Understandable Naming
Table names should be consistent, intuitive, and easy for downstream users to comprehend.
Supplementary Notes
A single model cannot satisfy all requirements; choose modeling approaches wisely.
Typical design sequence: Conceptual Model → Logical Model → Physical Model.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
