Designing a Reusable and Standardized Data Warehouse Model for a Data Middle Platform
The article explains how to evaluate and improve data warehouse design by measuring completeness, reusability and standardization, proposes concrete metrics such as cross‑layer reference rate and model reuse coefficient, and outlines a step‑by‑step process—from ODS control to domain division, dimension unification, fact‑table integration, model development and migration—while introducing the EasyDesign tool for systematic management.
Many companies face conflicts between analysts and data engineers because analysts often have to clean, process, and calculate metrics on raw data due to a lack of reusable data models, leading to resource‑intensive SQL and delayed deliveries.
The root cause is a non‑reusable data model; each new requirement forces developers to recompute from raw data, creating a "silo" architecture.
A good data model should be reusable, complete, and standardized. The article introduces three quantitative indicators:
Cross‑layer reference rate : the proportion of ODS tables directly referenced by DWS/ADS/DM layers; lower rates indicate better layer isolation.
Model reuse coefficient : average number of downstream models generated per upstream model; higher values reflect better reuse.
Summary‑query proportion : the share of queries hitting DWS/ADS/DM layers versus all queries; higher values mean more mature aggregation layers.
Based on these metrics, the article proposes a six‑step methodology to transform siloed mini‑warehouses into a shared data middle platform:
Take over the ODS layer : control the source data, enforce naming conventions (e.g., ods_<system>_<table>), and ensure one‑to‑one mapping with source tables.
Define business domains : abstract business processes into domains, build a bus matrix to list analysis dimensions for each domain.
Build consistent dimensions : create global dimension tables (e.g., dim_<domain>_<description>) and separate attributes by usage frequency or production time.
Integrate fact tables : keep statistical granularity consistent within a fact table; merge tables with the same granularity and domain, keep incompatible granularity separate.
Model development : enforce task dependencies, clean temporary tables, align task names with table names, set appropriate data lifecycles, and use compression (e.g., LZO) for DWD tables.
Application migration : verify data parity before migrating applications, then retire old tables.
The process is supported by the EasyDesign tool, which leverages a metadata center to manage domains, processes, layers, dimensions, measures, and approval workflows, providing a systematic way to enforce the above standards.
In summary, completeness, reusability, and standardization form a metric system to assess data warehouse quality; consistent dimension design and proper fact‑table granularity are essential; and a gradual, iterative migration approach, backed by dedicated teams and tools, leads to a robust, shared data platform.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
