Databases 12 min read

Comprehensive Overview of Data Warehousing, ETL, OLAP, and Data Cube Operations

This article provides a thorough introduction to data warehousing, covering warehouse creation, the ETL process, OLAP/BI tools, data cube concepts, common OLAP operations, and the three main OLAP architectural models (MOLAP, ROLAP, HOLAP).

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Comprehensive Overview of Data Warehousing, ETL, OLAP, and Data Cube Operations

Preface

Data warehouses are the core of analytical systems, and a complete warehouse solution also involves ETL engineering, online analytical processing (OLAP) tools, and business intelligence (BI) applications. This article gives a high‑level overview, especially of OLAP, to help readers understand the whole ecosystem.

Creating a Data Warehouse

Like databases, a warehouse is created by executing DDL statements. Historically most warehouses were built on RDBMS, but modern open‑source distributed tools such as Hadoop Hive and Spark SQL separate modeling from implementation. Modeling is often done with dedicated ER/dimension tools, while the actual tables are created in Hive/Spark SQL. Commercial platforms (e.g., Alibaba Cloud DataWorks) are also mentioned for their ease of use.

ETL: Extract, Transform, Load

The ETL stage is usually the most time‑consuming part of warehouse development. It extracts data from source systems, transforms it to match the warehouse schema, and loads it into dimension and fact tables.

1. Extract – Identify which source data are needed for the warehouse based on business topics.

2. Transform – Convert the extracted data structure and perform data cleaning to ensure quality.

3. Load – Insert the cleaned data into the warehouse, supporting both initial bulk loads and incremental refreshes. With modern distributed platforms, ETL often becomes ELT, where the source system only performs minimal cleaning and the heavy transformation is done inside the warehouse.

OLAP/BI Tools

After a warehouse is built, users can query it with SQL, but writing SQL for multidimensional analysis is cumbersome. OLAP tools simplify analysis of dimensional models, while BI tools visualize OLAP results. The relationship between OLAP tools and the warehouse differs for normalized versus dimensional warehouses, as illustrated by the following diagrams:

Data Cube

A data cube extends a two‑dimensional table into N dimensions, enabling multi‑angle analysis. The article explains the five basic cube operations: slice, dice, pivot, roll‑up, and drill‑down, and provides SQL examples for each.

Slice and Dice

# Slice
SELECT Locates.地区, Products.分类, SUM(数量)
FROM Sales, Dates, Products, Locates
WHERE Dates.季度 = 2
  AND Sales.Date_key = Dates.Date_key
  AND Sales.Locate_key = Locates.Locate_key
  AND Sales.Product_key = Products.Product_key
GROUP BY Locates.地区, Products.分类

# Dice
SELECT Locates.地区, Products.分类, SUM(数量)
FROM Sales, Dates, Products, Locates
WHERE (Dates.季度 = 2 OR Dates.季度 = 3)
  AND (Locates.地区 = '江苏' OR Locates.地区 = '上海')
  AND Sales.Date_key = Dates.Date_key
  AND Sales.Locate_key = Locates.Locate_key
  AND Sales.Product_key = Products.Product_key
GROUP BY Dates.季度, Locates.地区, Products.分类

Pivot – Changes the display orientation of the result set, essentially reordering selected columns.

Roll‑up and Drill‑down

# Roll‑up
SELECT Locates.地区, Products.分类, SUM(数量)
FROM Sales, Products, Locates
WHERE Sales.Locate_key = Locates.Locate_key
  AND Sales.Product_key = Products.Product_key
GROUP BY Locates.地区, Products.分类

# Drill‑down
SELECT Locates.地区, Dates.季度, Products.分类, SUM(数量)
FROM Sales, Dates, Products, Locates
WHERE Sales.Date_key = Dates.Date_key
  AND Sales.Locate_key = Locates.Locate_key
  AND Sales.Product_key = Products.Product_key
GROUP BY Dates.季度, Dates.月份, Locates.地区, Products.分类

OLAP Architectural Models

MOLAP (Multidimensional OLAP) – Generates a physical multidimensional cube; queries are fast but cube updates are slow.

ROLAP (Relational OLAP) – Uses star schemas and relational tables to simulate a cube; queries are translated to SQL and involve joins, so they are slower than MOLAP.

HOLAP (Hybrid OLAP) – Combines MOLAP and ROLAP, routing performance‑critical queries to the MOLAP engine and others to ROLAP.

Conclusion

The data‑warehouse ecosystem is extensive, involving modeling, ETL, architecture, and front‑end visualization teams. For aspiring data scientists, mastering the fundamentals of data storage, visualization, and modeling provides the highest return on investment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLData WarehouseOLAPETLdata cube
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.