Understanding the Origins, Significance, and Construction of Data Warehouses
This article explains the historical background of databases and data warehouses, outlines why data warehouses are essential for modern enterprises, and provides a step‑by‑step guide to building a data warehouse using Kimball’s dimensional modeling approach.
The rapid development of the Internet has led enterprises to build their own portals, servers, and user bases, storing all transactional data in enterprise‑level databases, which are essential for low‑latency CRUD operations.
1. Background of Database Creation
Databases are divided into relational (e.g., MySQL, Oracle, SQL Server) and non‑relational (e.g., Redis, HBase, MongoDB) types, and they serve as the backbone for any application architecture.
2. Background of Data Warehouse Creation
Although the concept of "big data" and the idea that "data speaks" have existed for a long time, limited hardware and immature processing frameworks delayed practical use; the advent of Hadoop and its ecosystem enabled cost‑effective horizontal scaling and efficient data management, leading to the emergence of data warehouses to standardize and exploit massive data.
3. Significance of Building a Data Warehouse
Enterprises build data warehouses and data marts primarily to provide strong data support for upper‑layer analytical applications. The key evaluation criteria include:
Performance : Fast query response and reduced I/O by abstracting common logic into reusable data models.
Cost : Lower compute and storage costs through controlled redundancy (e.g., degenerate dimensions, wide tables).
Efficiency : Improved user experience by exposing an abstracted ADS (application data) layer.
Quality : Consistent statistical definitions and reduced calculation errors.
4. How to Build a Data Warehouse
A scientific data‑warehouse model requires solid theoretical support; this article follows Kimball’s modeling methodology. The high‑level steps are:
Requirement research → Business research → Domain segmentation → Metric system construction → DIM layer processing → ODS layer → DWD layer → DWS layer → ADS layer (data marts).
DIM Layer
The dimension layer creates dimension tables, the core of a data warehouse, by extracting and processing data from various business line databases.
Identify primary dimension tables
Identify secondary dimension tables
Define dimension attributes
Normalize and denormalize as needed
Handle special and fact dimensions
ODS Layer
The Operational Data Store (ODS) extracts data from production systems such as MySQL, HBase, Oracle, or SQL Server. It must ensure data reliability and avoid issues like missing data, inaccurate data, inconsistent naming, type mismatches, unit inconsistencies, missing comments, and unclear table names.
DWD Layer
The Data Warehouse Detail (DWD) layer contains wide tables that integrate multiple business processes, including transaction‑type fact tables and cumulative snapshot fact tables, facilitating complex metric calculations.
DWS Layer
The Data Warehouse Summary (DWS) layer aggregates data for periodic metrics (e.g., monthly sales) by creating snapshot fact tables for various time windows such as hourly, daily, weekly, or monthly.
ADS Layer
The Application Data Store (ADS) layer provides the highest level of abstraction, delivering ready‑to‑use data directly to business applications without exposing underlying complexities, and can be customized into separate data marts for different business lines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
