Mastering Data Modeling: From Raw Data to Insightful Warehouses
This article walks through the fundamentals of data modeling, explaining what data is, the DIKW framework, why modeling matters, and detailing the end‑to‑end process from conceptual design through logical and physical layers, including DIM, DWD, DWS, and ADM tables with practical tips and naming conventions.
What Is Data?
Data is an abstract representation of the quantity, attributes, location, and relationships of objective things; essentially anything that can be recorded.
The DIKW framework transforms data into information, knowledge, and wisdom through processing and contextualization.
Why Data Modeling?
Modeling provides structured, standardized storage, improves query efficiency, and reduces storage/computation costs.
Overall Data Modeling Process
Conceptual Modeling Design
Convert business domain knowledge into graphical representations to guide logical modeling. It starts with requirement research, including business research (bottom‑up system understanding) and requirement analysis (top‑down analyst and operations needs). Then perform data domain partitioning and build a bus matrix to define modules and their relationships.
Logical Modeling Design
Define dimensions and metrics. Distinguish atomic metrics (indivisible business measures) from derived metrics (aggregated or calculated from atomic metrics). Follow a step‑by‑step process: select business processes, declare grain, determine dimensions, identify relevant facts, and consider redundant dimensions.
Building Consistent Dimension Tables (DIM Layer)
Step 1: Choose business object.
Step 2: Define dimension attributes.
Step 3: Define related dimensions.
Step 4: Add redundant dimensions when needed.
Tips include ensuring a unique primary key (enforced by DQC), applying horizontal/vertical splitting for large or heterogeneous attributes, and naming conventions such as
{project_name}.dim_{business_key}[_{data_unit}]_{dim_abbr}[_{custom_tag}].
Building Detailed Fact Tables (DWD Layer)
Fact tables are categorized into transaction, periodic snapshot, and cumulative snapshot tables. Design steps: choose business process, declare grain, determine dimensions, select facts, and handle redundant dimensions. The DWD layer should be stable, preserving raw details and minimizing transformation.
Naming example:
{project_name}.dwd_{business_key}_[_{data_unit}]_{process_abbr}[_{custom_tag}]_{refresh_cycle}{partition_flag}.
Building Aggregated Fact Tables (DWS & ADM Layers)
Design aggregation tables by defining statistical grain, metrics, and physical tables, while adding redundant dimensions for performance. Follow naming patterns like
{project_name}.dws_{business_key}[_{data_unit}][_{grain}][_{process}][_{tag}]_{suffix}and {project_name}.adm_{business_key}[_{unit}][_{tag}]_{suffix}. Avoid aggregating dimensions together and prefer incremental over full recomputation.
Physical Modeling Design
After logical design, proceed to code development (SQL, data quality checks, testing) and deployment (ETL job generation, monitoring, DQC configuration).
Data Model Validation and Service
Validate models through testing and ensure they support business insights, such as community data services that help identify market opportunities, track user behavior, and drive product optimization.
Case Study: Community Business Data Service
The author describes taking over a community product, analyzing market trends, and building a data service framework that includes ER diagrams, business architecture, and a bus matrix, illustrated with several images.
Indicator System and OSM Decomposition
Illustrated with diagrams showing atomic and derived metrics, their definitions, and how they feed into aggregated tables.
Dimension Table Design
Shows best practices for naming, splitting, and handling redundant attributes.
Additional Note
At the end, a brief mention of using ChatGLM and LangChain to build conversational models is included, but it is not central to the data modeling discussion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
