An Introduction to Dimensional Modeling in Data Warehousing
This article provides a comprehensive overview of data warehouse concepts, compares classic warehouse models, explains dimensional modeling fundamentals such as fact and dimension tables, demonstrates a practical e‑commerce scenario with schema design and SQL query examples, and discusses real‑world trade‑offs.
The author shares personal notes on data warehousing, covering both conceptual content from books and practical insights from work, acknowledging possible errors.
Data warehouses encompass architecture, modeling, and methodology, including Hadoop, Spark, Hive, dimensional modeling, and supporting systems like scheduling, metadata, ETL, and visualization.
The article outlines the agenda: classic warehouse models, detailed dimensional modeling concepts, an e‑commerce case study applying the theory, and real‑world compromises.
Four classic warehouse models are introduced: the ER (3NF) model, dimensional modeling (star and snowflake schemas), Data Vault, and Anchor modeling, each with brief descriptions and references.
Dimensional modeling, advocated by Ralph Kimball, focuses on building models that serve analytical needs, offering fast query performance and supporting large‑scale complex queries. Core components include fact tables (recording measurable events) and dimension tables (holding descriptive attributes).
Example fact table illustration shows an order table representing purchase events, while dimension tables such as user, merchant, and time provide context.
A practical e‑commerce scenario is described, identifying entities like buyer, merchant, product, time, and amount, leading to a star schema design with separate fact and dimension tables.
The proposed dimensional model offers low data redundancy, clear structure, and OLAP friendliness, but incurs higher query complexity and potential data inconsistency. In contrast, a wide “big table” design is intuitive and easy to query but suffers from massive redundancy and low reusability.
Implementation example: a SQL query to calculate the total price of LV products purchased by male users in Beijing in 2016.
SELECT SUM(order.money) FROM order, product, date, address, user WHERE date.year = '2016' AND user.sex = 'male' AND address.province = '帝都' AND product.name = 'LV';Conclusion emphasizes that while dimensional modeling has many advantages, practical constraints often lead to compromises such as using wide tables despite redundancy.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.