Why Hologres Dynamic Table Beats Traditional Full Refresh for Real‑Time Data Warehousing
The article explains how Hologres Dynamic Table uses a stateful incremental refresh model to efficiently handle massive historical data with tiny daily updates, dramatically reducing latency and resource consumption compared with conventional full‑refresh pipelines across several real‑world join and aggregation scenarios.
Enterprises are increasingly moving toward unified and real‑time data warehouses, where the core challenge is processing "massive historical + tiny new" data efficiently. Traditional full‑refresh approaches become prohibitively slow and costly when scanning billions of rows for each refresh.
Dynamic Table: a declarative incremental refresh paradigm
Dynamic Table is a declarative data‑processing architecture that automatically maintains materialized results of one or more base tables. It supports two refresh modes:
Full refresh : each run recomputes the entire result set, similar to INSERT OVERWRITE.
Incremental refresh : each run processes only newly inserted rows, merging them with persisted intermediate state.
Hologres, Alibaba Cloud’s high‑performance real‑time warehouse, implements Dynamic Table with a stateful incremental computation model . During the initial full build, Hologres materializes and persists key intermediate results (e.g., join outputs, aggregation buffers). Subsequent incremental runs only need to read the small set of new rows and the persisted state, avoiding full scans of the historical data.
Why stateful incremental refresh is faster
The stateful model stores a compact State table that holds aggregated values or join intermediates. When new data arrives, Hologres merges it with the State table, applying runtime‑filter optimizations to read only the necessary rows. This reduces both I/O and CPU usage, especially for queries with COUNT(DISTINCT), SUM, or multi‑table joins.
Stateless incremental engines recompute the entire query plan for each refresh, often generating dozens of execution nodes and repeatedly scanning large partitions, which leads to higher latency and resource consumption, particularly when the workload includes updates or deletions (retractions).
Performance evaluation
Three representative workloads were tested on Hologres Serverless (15 CU) against a competing product with similar specifications.
Scenario 1 – Single‑table aggregation (COUNT DISTINCT + SUM)
SQL used:
CREATE DYNAMIC TABLE DT_ORDER_DETAIL_AGG WITH (
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
freshness = '1 minutes'
) AS
SELECT
PRODUCT_ID,
COUNT(DISTINCT USER_ID) AS UV,
SUM(LINE_AMOUNT) AS SUM_LINE_AMOUNT,
MAX(QUANTITY) AS MAX_QUANTITY
FROM ORDER_DETAIL
GROUP BY PRODUCT_ID;Key results (seconds):
Full refresh (10 M rows): Hologres 4.6 s vs competitor 3.9 s.
First incremental run: Hologres 0.59 s vs competitor 1.5 s.
Second incremental run: Hologres 0.49 s vs competitor 2.7 s.
Third incremental run: Hologres 0.45 s vs competitor 2.9 s.
Storage: source 10 M rows (252 MB); result 1 MB; state 93 MB.
Scenario 2 – Two‑table join (orders + order_items)
SQL used:
CREATE DYNAMIC TABLE DT_ORDER_DETAIL WITH (
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
freshness = '1 minutes'
) AS
SELECT
o.ORDER_ID,
o.ORDER_DATE,
o.ORDER_STATUS,
oi.ORDER_ITEM_ID,
oi.PRODUCT_ID,
oi.QUANTITY,
oi.UNIT_PRICE,
oi.LINE_AMOUNT
FROM ORDERS o
JOIN ORDER_ITEMS oi ON o.ORDER_ID = oi.ORDER_ID;Key results (seconds):
Full refresh: Hologres 9.2 s vs competitor 6.29 s.
First incremental run: Hologres 0.68 s vs competitor 2.2 s.
Second incremental run: Hologres 0.50 s vs competitor 1.1 s.
Third incremental run: Hologres 0.51 s vs competitor 1.9 s.
Storage: source tables 5 M + 10 M rows (370 MB); result 1 MB; state 228 MB.
Scenario 3 – Five‑table complex join (orders, users, products, order_items, payments)
SQL used:
CREATE DYNAMIC TABLE DT_ORDER_DETAIL WITH (
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
freshness = '1 minutes'
) AS
SELECT
o.ORDER_ID,
o.ORDER_DATE,
o.ORDER_STATUS,
u.USER_ID,
u.USER_NAME,
u.EMAIL,
u.STATUS AS USER_STATUS,
oi.ORDER_ITEM_ID,
oi.PRODUCT_ID,
p.PRODUCT_NAME,
p.CATEGORY,
p.PRICE AS PRODUCT_PRICE,
oi.QUANTITY,
oi.UNIT_PRICE,
oi.LINE_AMOUNT,
pay.PAYMENT_ID,
pay.PAY_AMOUNT,
pay.PAY_METHOD,
pay.PAY_TIME
FROM ORDERS o
JOIN USERS u ON o.USER_ID = u.USER_ID
JOIN ORDER_ITEMS oi ON o.ORDER_ID = oi.ORDER_ID
JOIN PRODUCTS p ON oi.PRODUCT_ID = p.PRODUCT_ID
LEFT JOIN PAYMENTS pay ON o.ORDER_ID = pay.ORDER_ID;Key results (seconds):
Full refresh: Hologres 23 s vs competitor 30 s.
First incremental run: Hologres 0.68 s vs competitor 3.9 s.
Second incremental run: Hologres 0.51 s vs competitor 2.8 s.
Third incremental run: Hologres 0.51 s vs competitor 1.8 s.
Storage: source tables total ≈ 20 M rows (594 MB); result 1.2 GB; state 1.1 GB.
Why the stateful approach wins
Stateless incremental refresh suffers from:
Multiple joins per refresh (≈ N × (N‑1) joins), causing large execution graphs.
Heavy union operations where only one branch processes incremental rows while the others still scan full partitions.
Complex retraction handling that forces additional ordering logic.
Stateful refresh reduces join count to 2 × (N‑1), pushes filters down to the compact State tables, and simplifies retraction by adjusting output order within the same layer.
Storage overhead and mitigation
The extra storage for State tables is proportional to the size of the source and result tables. In partitioned tables, only active partitions retain state; inactive partitions are automatically cleaned up. For non‑partitioned tables, TTL can be configured to drop stale state.
Conclusion
Hologres Dynamic Table’s stateful incremental computation resolves the “incremental ≠ efficient” problem of traditional pipelines. It shortens refresh latency, cuts CPU/IO usage, and simplifies data‑pipeline maintenance, enabling a truly "write‑once‑SQL, auto‑update" experience for real‑time dashboards, user‑profile wide tables, and risk‑control feature pipelines.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
