Dynamic Table: A Next‑Generation Data Processing Architecture Powered by Incremental Computing
The article examines the limitations of traditional batch and stream processing, explains how Hologres Dynamic Table combines declarative freshness settings with stateful incremental computation to bridge the gap between low‑cost batch jobs and low‑latency streaming, and presents benchmark results and real‑world case studies.
Challenges of Traditional Data Processing
Data volumes have exploded and business requirements for data timeliness have become stricter. Traditional offline processing (e.g., MapReduce, Spark, BSP) offers high throughput and low cost but delivers day‑ or hour‑level latency, which cannot satisfy real‑time needs. Stream processing provides millisecond latency but incurs high cost, complex operations, and issues such as retraction handling, high resource idle rates, and limited SQL capabilities. The Lambda architecture attempts a compromise but introduces data redundancy, semantic inconsistency, and high development and O&M overhead.
Consequently, a large performance and cost gap exists between high‑latency batch jobs and low‑latency stream jobs, creating a market need for a solution that can flexibly operate from minute‑level to hour‑level latency with controllable cost.
Hologres Dynamic Table Architecture
Hologres introduces Dynamic Table , a declarative data‑processing framework. Users specify a freshness = '1 minute' attribute, and the system automatically schedules data flow, processing, and refresh. A Dynamic Table is a physical table in Hologres that can be indexed like a regular table, unifying data processing and high‑performance query.
Core Syntax and Refresh Modes
The table definition requires two key parts:
Refresh attribute : defines data freshness, e.g., freshness = '1 minute'.
Core business logic : a standard SQL query that describes the transformation.
Dynamic Table supports both incremental refresh and full refresh with identical SQL logic. Developers can start with full refresh for rapid validation and then switch to incremental refresh to gain lower latency and cost, embodying the principle “One SQL, One Data, One Engine”.
Stateful Incremental Computation
Incremental computation is achieved by detecting changes in the base table and only processing the delta. Hologres 3.1+ adds a Stream mode that reads file‑level change logs without extra storage overhead, using high‑frequency snapshots and Snapshot Diff to generate increments directly, avoiding full‑history binlog storage.
The engine employs an efficient diff algorithm that collapses millions of intermediate updates into just the before‑and‑after records, dramatically reducing downstream I/O. Columnar storage further improves I/O efficiency by reading only the columns involved in the delta.
For multi‑table joins, Hologres retains intermediate state, reducing join complexity from N*(N‑1) operations to roughly 3*(N‑1). Persistent state enables custom indexing on join keys, improving join performance and simplifying rollback logic.
Resource scheduling is serverless: no resident resources are allocated until incremental data is detected. The system automatically derives the required compute resources based on the size of the increment, achieving higher elasticity than traditional stream engines.
Performance Evaluation
Benchmark tests show that Hologres incremental computation outperforms leading international vendors in single‑table aggregation and multi‑table join scenarios. In the Nexmark Q23 benchmark, Hologres achieves a CU throughput 3‑6× higher than a mainstream stream engine.
Real‑World Use Cases
Scenario 1 – Minute‑Level Real‑Time Aggregation Dashboard : For an e‑commerce platform, a Dynamic Table with freshness = '1 minute' and a SQL that groups by product category automatically ingests lake‑level increments and updates the dashboard every minute without any ETL orchestration.
Scenario 2 – Logical Partition for Unified Real‑Time and Offline Views : Users specify an active partition window (e.g., the last 2 days) that uses incremental refresh, while historical partitions remain static and can be refreshed manually with full refresh, providing a single view for both real‑time analysis and historical correction.
Scenario 3 – High‑Performance UV Calculation : By leveraging Roaring Bitmap and the RB_BUILD_AGG() function, incremental updates build bitmap states; queries use RB_CARDINALITY(RB_OR_AGG(...)) to compute exact UV values for any period with minimal latency and cost.
Retail Company Case : A snack‑chain with daily 100 billion‑row transaction streams reduced data freshness from days to minutes. Using Dynamic Table, a 40 billion‑row inventory table refreshed in 20 seconds (7 joins) and a 200 billion‑row transaction table refreshed in 2 seconds (2 joins). Overall compute cost dropped by 50 % while development and O&M effort decreased due to pure‑SQL development.
Conclusion
Hologres Dynamic Table fills the gap between batch and stream processing by accelerating ETL, delivering minute‑level latency at controllable cost, and simplifying development with a unified SQL engine. In the AI era, its ability to process unstructured data from OSS and invoke AI functions further expands its applicability to emerging scenarios.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
