Big Data 10 min read

Cut OLAP Cube Storage Explosions: Proven Dimensionality Reduction Tricks with Rubik

This article explains why raw OLAP Cubes consume exponential storage, then details six practical dimensionality‑reduction methods—Aggregation Group, Mandatory Dimension, Joint Dimension, Derived Dimension, Hierarchy Dimension, and Partial Cube—showing how each can dramatically shrink materialized tables while preserving query performance.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Cut OLAP Cube Storage Explosions: Proven Dimensionality Reduction Tricks with Rubik

Original Cube Space Problem

In an unoptimized OLAP Cube, every field or expression in a dimension table becomes a dimension, so a Cube with n dimensions generates 2^n possible dimension combinations, each materialized as a separate table. When dimensions increase to n+m, the number of tables grows to 2^(n+m), leading to exponential storage explosion.

Therefore, reducing the number of dimension combinations while still satisfying most OLAP query needs is crucial.

Rubik‑Supported OLAP Cube Dimensionality‑Reduction Methods

Aggregation Group

When the number of dimensions exceeds ten, group them by usage scenarios. For example, split 50 dimensions into two sub‑Cubes of 20 and 30 dimensions based on business domains.

Result: the number of materialized tables drops from 2^(m+n) to 2^m + 2^n, an exponential reduction.

Mandatory Dimension

A mandatory dimension appears in every query’s WHERE or GROUP BY clause. Declaring a dimension as mandatory discards any combination that omits it.

Example: in a banking Cube, the currency field must always be specified; setting it as mandatory halves the Cube size.

Each mandatory dimension reduces the table count by half; n mandatory dimensions shrink 2^m to 2^(m‑n).

Joint Dimension

Combine two or more dimensions into a single joint dimension when they are always queried together or have low cardinality.

Always‑together dimensions, e.g., name and id, can be merged into {name,id} without loss.

Low‑cardinality dimensions, e.g., sex and class, can be merged, accepting a slight performance trade‑off.

If n dimensions become joint, the combination count changes from 2^m to 2^(m‑n+1).

Derived Dimension

When a column (Basic Column) can derive other columns (Derived Columns), treat the Basic Column as a dimension and omit the derived ones.

Example: if an ID can determine name and sex, GROUP BY id alone is equivalent to GROUP BY id,name,sex, reducing the combination space.

If there is 1 Basic Column and m Derived Columns, the number of combinations drops from 2^m to 1.

Hierarchy Dimension

A hierarchy groups dimensions with a parent‑child relationship (e.g., province‑city‑county). Only combinations that respect the hierarchy are kept.

For a Cube with m dimensions forming a single hierarchy, the table count reduces from 2^m to m, without affecting query efficiency if queries follow the hierarchy.

Partial Cube

If a dimension combination contains two medium‑cardinality dimensions (cardinality 1,000–10,000) or one high‑cardinality dimension (cardinality >100,000), the combination provides little aggregation value and should be discarded, saving space and computation.

OLAP Cube Construction Steps Summary

1. Group all dimensions into Aggregation Groups based on business scenarios, creating multiple Cube instances if needed.

2. For each group, analyze whether mandatory dimensions exist, whether a hierarchy can be built, and whether joint or derived dimensions are applicable.

3. If the number of dimensions remains large after hierarchy reduction, consider Partial Cube to drop low‑value combinations.

Following these steps maximizes Cube storage efficiency while preserving optimization benefits.

Data WarehousingOLAPdimensionality reductionCube OptimizationRubik
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.