Data Warehouse Modeling: Concepts, Methods, and Implementation
This article explains what data models are, why model refactoring is necessary, compares normalized and dimensional data warehouse modeling approaches, and details a three‑step implementation process—including information research, model design, and model deployment—while highlighting best‑practice naming conventions and practical examples.
1 What is a Data Model
According to Baidu Encyclopedia, a Data Model is an abstraction of data characteristics that describes a system's static features, dynamic behavior, and constraints, providing an abstract framework for database information representation and manipulation.
A data warehouse model refers to the method of organizing and storing data, emphasizing reasonable storage from business, access, and usage perspectives.
In Dama2.0, data modeling is defined as the process of discovering, analyzing, and determining data requirements and representing these requirements precisely with a data model; it is a crucial component of data management.
Data warehouse modeling essentially builds a storage model to structurally store valuable business data, aiming for lower storage cost, clearer data structures, and faster, higher‑quality support for business analysis.
2 Why Model Refactoring is Needed
Data duplication : Early rapid development often leads to siloed, tightly coupled implementations without proper modeling, wasting storage and compute resources.
Missing business processes : Rapid business evolution can outpace the data model, leaving it unable to meet current analytical needs.
Inconsistent data standards : Different contributors build models based on personal understanding, causing divergent metric definitions and field naming across tables.
Data non‑traceability : Lack of snapshot backups makes it impossible to compare data across days, hindering auditability of important data points.
3 Data Warehouse Modeling Methods
The two mainstream methods are normalized (relational) modeling and dimensional modeling.
Normalized Modeling
Inmon proposes a top‑down approach that uses entities and relationships to describe the enterprise business architecture, adhering to 3NF. Unlike OLTP 3NF, this version focuses on subject‑oriented abstraction for integration and consistency governance.
Dimensional Modeling
Kimball’s bottom‑up approach builds models based on analytical decision needs, enabling rapid analysis and high‑performance large‑scale queries.
Normalized modeling requires deep business and data understanding and a long implementation cycle, whereas dimensional modeling is analysis‑driven, offering flexibility, speed, and strong performance for evolving business scenarios like those of the Zhengcai Cloud platform.
4 Model Refactoring Implementation
The overall workflow consists of three steps: information research (business grooming), model design, and model implementation.
4.1 Information Research
Identify existing data models to ensure the new model is compatible with current usage scenarios and gather user data requirements.
Business research deepens understanding of business and data, clarifying construction scope and boundaries.
Collecting business and analyst requirements reveals current and future model usage scenarios, guiding methodology and design.
Data sniffing surveys source systems to capture detailed business data, preventing anomalies caused by evolving business models.
4.2 Model Design
Data Warehouse Layers The Zhengcai Cloud data warehouse is divided into four layers: ODS (source layer), DWD & DIM (model layer), DWS (summary layer), and ADS (application layer). The model layer is the core, handling data cleaning, common logic, standardization, and dimension degeneration to organize data and shield downstream applications from upstream changes.
Warehouse Naming Conventions Standardized naming ensures consistency across the warehouse, covering null handling, unit unification, enum mapping, field formats, storage rules, script standards, table and field naming, and task naming. ods layer: ods_{source_db}_{source_table}_{partition} dim layer: dim_{business_line}_{subject}_{dimension}_{partition} dwd layer: dwd_{business_line}_{subject}_{process}_{partition} dws layer: dws_{business_line}_{subject}_{dimension}_{process}_{partition} Guidelines include lowercase letters, underscore separation, avoiding SQL keywords, suffixes like _cnt for counts and _time for timestamps, and common abbreviations.
Modeling Process Select Business Process : Analyze the full lifecycle to identify key steps and determine the related fact tables (e.g., order transaction, contract signing). Define Granularity : Specify the level of detail for each fact row (e.g., one order per row, one order‑item per row). Confirm Dimensions : Choose dimensions that describe the business context (e.g., region, organization, time, method). Confirm Facts : Identify measurable metrics (e.g., payment amount, freight cost). Redundant Dimensions : Add high‑frequency downstream dimensions directly to fact tables to reduce joins (e.g., supplier store, brand).
4.3 Model Implementation
Model Development: Materialize the physical model, consider performance and complexity, and iteratively refine based on feedback.
Task Deployment: Configure task dependencies strictly and add data‑quality monitoring to ensure reliable output.
Metadata Maintenance: Enable rapid onboarding for business users and promote the model.
Replace Old Model: Compare new and old data, resolve inconsistencies, ensure downstream systems have switched, and retire the legacy model.
5 Summary
By applying dimensional modeling and warehouse layering theory, the reconstructed model—using a star‑detail schema with multi‑dimensional aggregates—makes the data warehouse richer, clearer, and able to meet both rapid standard queries and ad‑hoc analytical demands, thereby increasing its business value. Data models have no single solution; the optimal one aligns with the company’s analytical scenarios and evolves with business changes.
Recruitment
The Zhengcai Cloud technical team (Zero) is a passionate, creative, and execution‑focused group based in scenic Hangzhou, comprising over 500 engineers from Alibaba, Huawei, NetEase, and top universities. Beyond daily development, the team explores cloud‑native, blockchain, AI, low‑code platforms, middleware, big data, material systems, engineering platforms, performance, and visualization, contributing to many open‑source projects.
If you want to drive change, join a team that values ideas, collaboration, and impact. Send your resume to [email protected] for consideration.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.