Big Data 28 min read

How to Pick Real-Time Dimension & Result Tables for Cloud‑Native Big Data

This article examines the evolution of big‑data architectures toward cloud‑native, real‑time processing, and provides a detailed comparison of dimension‑table and result‑table options—including MySQL, Redis, and Alibaba Cloud Tablestore—along with their performance, cost, and scalability characteristics for Flink SQL workloads.

Alibaba Cloud Developer

Sep 15, 2021

How to Pick Real-Time Dimension & Result Tables for Cloud‑Native Big Data

Preface

Traditional big‑data technologies originated from Google’s GFS, MapReduce, and Bigtable, later open‑sourced as HDFS, MapReduce, and HBase. Early deployments relied on on‑premise Hadoop clusters for offline batch processing and large‑scale storage.

With growing data volume, diverse use cases, and the rise of cloud services, classic architectures can no longer meet requirements. Modern big‑data systems now emphasize three trends: massive scale, real‑time processing, and cloud‑native design.

Real‑Time Computing in Big‑Data Architecture

Typical real‑time scenarios include:

Real‑time data warehouses for PV/UV, transaction, and sales statistics.

Real‑time recommendation engines powered by AI.

Streaming ETL for data synchronization and preprocessing.

Real‑time fraud detection and monitoring in finance.

Apache Flink has become the de‑facto engine for low‑latency stream processing, offering a SQL front‑end that abstracts away the underlying execution details.

Flink SQL provides three table types:

Source tables – e.g., Kafka, MQ, or CDC streams.

Result tables – destinations such as MySQL, HBase, etc.

Dimension tables – stores of static or slowly changing reference data (e.g., MySQL, Redis).

Below is a typical Flink SQL job that calculates per‑minute GMV by joining a Kafka source with a Redis dimension table and writing the result to MySQL:

# Source table - user order data
CREATE TEMPORARY TABLE user_action_source (
  `timestamp` BIGINT,
  `user_id` BIGINT,
  `item_id` BIGINT,
  `price` DOUBLE
) WITH (
  'connector' = 'kafka',
  'topic' = '<your_topic>',
  'properties.bootstrap.servers' = 'your_kafka_server:9092',
  'properties.group.id' = '<your_consumer_group>',
  'format' = 'json',
  'scan.startup.mode' = 'latest-offset'
);

# Dimension table - item details
CREATE TEMPORARY TABLE item_detail_dim (
  id STRING,
  catagory STRING,
  PRIMARY KEY (id) NOT ENFORCED
) WITH (
  'connector' = 'redis',
  'host' = '<your_redis_host>',
  'port' = '<your_redis_port>',
  'password' = '<your_redis_password>',
  'dbNum' = '<your_db_num>'
);

# Result table - GMV per minute and category
CREATE TEMPORARY TABLE gmv_output (
   time_minute STRING,
   catagory STRING,
   gmv DOUBLE,
   PRIMARY KEY (time_minute, catagory)
) WITH (
   type='rds',
   url='<your_jdbc_mysql_url_with_database>',
   tableName='<your_table>',
   userName='<your_mysql_database_username>',
   password='<your_mysql_password>'
);

INSERT INTO gmv_output
SELECT
  TUMBLE_START(s.timestamp, INTERVAL '1' MINUTES) AS time_minute,
  d.catagory,
  SUM(d.price) AS gmv
FROM user_action_source s
JOIN item_detail_dim FOR SYSTEM_TIME AS OF PROCTIME() AS d
  ON s.item_id = d.id
GROUP BY TUMBLE(s.timestamp, INTERVAL '1' MINUTES), d.catagory;

Dimension Table Design

Dimension tables must satisfy high‑throughput, low‑latency reads, tight integration with the compute engine, and elastic compute resources. Three common solutions are evaluated:

MySQL – widely used but suffers from limited storage elasticity, high cost at scale, and poor performance for massive QPS.

Redis – offers millisecond‑level reads and easy horizontal scaling, yet memory cost becomes prohibitive for large dimension datasets.

Tablestore (Alibaba Cloud) – a cloud‑native NoSQL store with storage‑compute separation, serverless operation, high‑throughput writes, multi‑model indexing, and built‑in CDC (Tunnel Service) for seamless Flink integration.

Cost‑performance comparisons across four workload patterns (high/low storage × high/low compute) show Tablestore delivering the best balance for large‑scale, high‑throughput scenarios, while Redis excels in low‑storage, high‑compute cases.

Result Table Design

Result tables store the output of real‑time jobs and must support:

Petabyte‑scale storage.

Rich query and aggregation capabilities (OLAP, search, MPP).

High‑throughput writes.

Data derivation via CDC.

Cloud‑native, storage‑compute separation.

Candidate technologies include:

MySQL – familiar but limited by storage cost, query capability, and write throughput.

HBase – scalable, high‑throughput writes, but weak query support, no CDC, and high operational overhead.

HBase + Elasticsearch – adds search capabilities but doubles infrastructure and still lacks CDC.

Tablestore – provides high‑throughput writes, multi‑type indexing, CDC, and a serverless, cost‑effective cloud‑native model.

For a scenario requiring billions of e‑commerce orders (≈1 TB) with 1 000 writes/sec and occasional analytical queries, Tablestore achieves the required performance at the lowest estimated monthly cost.

Conclusion

The article presents a comprehensive analysis of real‑time dimension and result table choices within a cloud‑native big‑data stack. Alibaba Cloud Tablestore stands out for its elasticity, low cost, and tight Flink integration, making it a strong candidate for most large‑scale streaming workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data cloud-native Real-Time Computing Flink SQL Tablestore dimension tables result tables

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.