Understanding Slowly Changing Dimensions (SCD) in Data Warehousing
The article explains the concept of Slowly Changing Dimensions (SCD) in data warehouses, illustrates why tracking dimension changes is essential for accurate historical analysis, and details Kimball's five primary SCD types (0‑4) with their implementation strategies and trade‑offs.
The author, known for strong opinions on SQL development, introduces the topic of Slowly Changing Dimensions (SCD) in data warehouses, emphasizing that while operational databases reflect real‑time changes, analytical warehouses need to preserve historical states for accurate reporting.
Slowly Changing Dimensions (SCD)
SCDs are dimension attributes that change infrequently over time. For example, a salesperson may move between regional offices; the operational system would simply update the current address, but the warehouse must retain the previous location to support historical sales analysis.
Data warehouses are characterized by static historical data, minimal deletions, and periodic growth, which makes it necessary to decide which attributes require change tracking and which can remain static.
Kimball’s methodology outlines several SCD handling techniques, focusing on the first five types (0‑4) that are widely used.
Type 0 – Preserve Original Value
Attributes that never change; the original value is always used for grouping. This approach is rarely recommended because it discards any legitimate updates.
Type 1 – Overwrite Update
The dimension is updated in place, matching the operational system. Historical values are lost, which can lead to inaccurate historical reporting (e.g., sales before a transfer are incorrectly attributed).
Type 2 – Add New Row (Effective/Expiration Dates)
Two additional columns—effective date and expiration date—are added. When a change occurs, a new row is inserted with the new attribute values, while the previous row’s expiration date is set. A surrogate key distinguishes versions, preserving full history at the cost of data growth.
Type 3 – Add New Attribute Column
A separate column stores the previous value (e.g., pre_location) while the original column holds the current value. This captures only the most recent change and can lead to many added columns if multiple attributes change.
Type 4 – Create Mini‑Dimension Table
For high‑volume, frequently changing attributes, a separate mini‑dimension table is created to isolate those changes, reducing the impact on the main dimension table.
The article concludes with a comparative diagram from Kimball’s "The Data Warehouse Toolkit" that summarizes the advantages and drawbacks of each SCD type.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
