Big Data 10 min read

How to Optimize OLAP Cubes with Rubik: Dimensional Reduction Strategies Explained

This article walks through Rubik's OLAP cube reduction techniques—including aggregation groups, required, combined, derived, hierarchical, and partial cubes—by designing and implementing buyers and suppliers cubes with six tables, demonstrating performance gains through pre‑computed queries and SQL examples.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
How to Optimize OLAP Cubes with Rubik: Dimensional Reduction Strategies Explained

Optimization Strategy Review

In the previous article we introduced several cube reduction strategies to lower data expansion, reduce storage pressure, and shorten cube build time; this section revisits those strategies.

Tables Used

The demonstration employs six tables: trade_fact (fact table) and five dimension tables— buyers_dim , time_dim , area_dim , currency_dim , and suppliers_dim .

Requirement Analysis

Analysts need to analyze sales and supplier situations, including regional sales, consumption by gender/age/region, top suppliers by volume, and each supplier's performance over time.

Cube Model Design

Two cubes are created: buyers_cube for buyer‑centric analysis and suppliers_cube for supplier‑centric analysis. The buyers cube incorporates derived dimensions (id_no as base column yielding name, gender, age) and hierarchical dimensions (country‑province‑city). Currency is set as a required dimension. The suppliers cube uses a hierarchical time dimension (year‑month‑day) and a combined dimension for name and owner.

Implementing the Cube Design in Rubik

To build buyers_cube , first create dimension cube_buyers_dim, define the derived dimension by setting id_no as the base column, and create a hierarchical dimension with three levels: country, province, city. Then create a required dimension for currency and instantiate the cube linking trade_fact with the dimensions.

For suppliers_cube , create a hierarchical dimension year‑month‑day (similar to the buyers cube) and a combined dimension for name and owner.

Optimization Effects

After materializing the cubes, queries can read pre‑computed results, achieving order‑of‑magnitude speedups. Example analyses include total consumption per user, regional consumption, and supplier contribution over time.

SELECT id_no, name, currency, SUM(value) FROM buyers_dim a JOIN trade_fact b ON a.buyer_id = b.buyer_id GROUP BY id_no, name, currency;
SELECT country, province, city, currency, SUM(value) FROM buyers_dim a JOIN area_dim c ON a.area_id = c.area_id JOIN trade_fact b ON a.buyer_id = b.buyer_id GROUP BY country, province, city, currency;
SELECT name, owner, year, month, currency, SUM(value) FROM suppliers_dim a JOIN trade_fact b ON a.supplier_id = b.supplier_id GROUP BY name, owner, year, month, currency;

Summary of Optimization Operations

Mapping of strategies to Rubik actions: aggregation groups → separate cubes; required dimensions → required levels during instantiation; combined dimensions → same‑level attributes with optional symbiotic levels; derived dimensions → base column setting; hierarchical dimensions → level hierarchy; partial cube → mutually exclusive levels.

Data WarehouseOLAPCubeRubikDimensional Reduction
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.