Databases 8 min read

How Hologres Dynamic Table Accelerates Billion‑Row Data Refreshes

The article explains how Hologres Dynamic Table, a cloud‑native materialized‑view‑like feature, supports full and incremental refresh modes, enables minute‑level data freshness for billion‑row price tables, and provides join, aggregation, and partition capabilities while outlining its architecture, limitations, and real‑world performance gains.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Hologres Dynamic Table Accelerates Billion‑Row Data Refreshes

Background

Taobao‑Tmall price platform handles billions of rows with high‑frequency updates during major sales events. Operators need ultra‑low‑latency dashboards that can filter across dimensions such as product and store.

Why Dynamic Table?

Traditional SQL views recompute on every query, which is costly for large, frequently changing datasets. Materialized views store results but require periodic full refreshes to stay fresh. Hologres Dynamic Table combines the benefits of both: it stores pre‑computed results and can refresh either fully (INSERT OVERWRITE) or incrementally (micro‑batch state table) according to business needs.

Refresh Modes

Full Refresh : scheduled INSERT OVERWRITE of the entire table.

Incremental Refresh : processes only delta data. A column‑store state table holds intermediate results (similar to Flink state). Incremental data is aggregated in micro‑batches, merged with the state table, and finally bulk‑loaded into the dynamic table.

Key Capabilities (Hologres V3.1)

Refresh trigger: timed or manual; minimum configurable interval 1 minute.

Incremental mechanisms: Binlog (CDC) or Stream (file‑level, higher throughput).

Supported base table types: internal, dynamic, Paimon external, ODPS external, DLF external.

Full join, aggregation functions, and index configuration are supported.

Physical and logical partitioning are supported.

Window functions, IN‑subquery, and query rewrite are not supported in incremental mode.

Resource isolation: local or serverless (up to 4096 cores).

Limitations

Stream mode requires the base table to be column‑store.

A single stream job cannot consume multiple partitions of a partitioned upstream table simultaneously.

Refresh mode can be switched from incremental to full, but not from full to incremental.

Solution Architecture

The team built a pipeline that translates business‑level filter rules into Dynamic Table DQL and generates the corresponding DDL. The process includes:

Defining metric columns as entities in a metric system.

Providing a generic filter component that adapts to different business scenarios.

Storing default configurations (refresh interval, mode, join conditions, etc.) per scenario.

Generating Dynamic Table DDL from DSL expressions.

Implementing a refresh‑status monitor to detect incomplete or empty refreshes.

Supplying data via Flink jobs or paginated queries after the first successful refresh.

Near‑Real‑Time Reporting

By layering ODS → DWD → DWS → ADS with Dynamic Table, the team achieved incremental refreshes every minute. Serverless resources isolate the workload, reducing contention with other tasks.

Results

Second‑level creation and initial refresh of Dynamic Tables on billion‑row base tables.

Latency reduced from hour‑level to minute‑level for workloads of ~10 k RPS.

Minute‑level data can be compared across periods, supporting rapid decision‑making during high‑traffic events.

Code Samples

SELECT region, SUM(amount) AS total_sales FROM orders WHERE status = 'completed';
-- Create view
CREATE VIEW sales_summary AS
SELECT region, SUM(amount) AS total_sales FROM orders WHERE status = 'completed';

-- Query view
SELECT * FROM sales_summary;
Data WarehouseHologresDynamic TableIncremental Refreshmaterialized view
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.