Databases 13 min read

How Hologres Dynamic Table Redefines Data Processing with Incremental Computing

The article analyzes the limitations of traditional batch and stream processing, introduces Hologres Dynamic Table as a declarative, incremental‑compute framework that bridges the gap between low‑cost batch jobs and low‑latency streaming, and validates its performance with benchmarks and real‑world case studies.

DataFunSummit
DataFunSummit
DataFunSummit
How Hologres Dynamic Table Redefines Data Processing with Incremental Computing

Background: Limits of Existing Data Processing Architectures

Data processing has evolved from offline batch jobs that prioritize high throughput and low cost—using HDFS, MapReduce, Spark, or BSP—to stream systems that achieve millisecond latency at the expense of high resource consumption, complex retraction logic, and costly message queues such as Kafka. The Lambda architecture attempts to combine both but introduces data redundancy, semantic inconsistency, and high development and operations overhead.

Hologres Dynamic Table Overview

Hologres Dynamic Table is a declarative data‑processing framework that lets users specify a freshness = '1 minute' property; the system automatically handles data movement, computation, and scheduling. The table behaves like a regular Hologres table, supporting indexes for fast query performance while unifying data processing and query execution.

Core Syntax

Refresh Property : Define data freshness, e.g., freshness = '1 minute'.

Business Logic : Provide a standard SQL statement that describes the transformation.

The entire workflow requires no manual ETL orchestration; data flows from ODS to DWD/DWS layers and finally to BI reports automatically.

Incremental Refresh Capabilities

Dynamic Table supports both full and incremental refresh modes with identical SQL logic. Developers can validate with full refresh and then switch to incremental refresh for lower latency and cost, embodying the “One SQL, One Data, One Engine” principle and eliminating the multi‑engine complexity of Lambda.

Integration with AI and Unstructured Data

The engine natively processes unstructured data stored in OSS, invoking AI functions (e.g., text parsing, vectorization) and leveraging Hologres’s full‑text and vector indexes for end‑to‑end retrieval.

Incremental Computing Techniques

Incremental computation works by detecting changes in base tables and only processing those deltas. Hologres 3.1+ introduces a Stream mode that reads file‑level change logs, generating snapshots and snapshot diffs without extra storage overhead.

No extra storage : Snapshots replace traditional binlog duplication.

Efficient Diff Algorithm : Filters out redundant intermediate updates; e.g., 1,000,000 rapid updates result in only two records (pre‑ and post‑update) being emitted.

Column‑store advantage : Scans only changed columns, further reducing I/O.

Stateful Incremental Join Optimization

Instead of stateless joins that require O(N·(N‑1)) operations, Hologres retains intermediate state for each side of a join, reducing complexity to roughly 3·(N‑1) joins. Persistent state enables custom storage (e.g., indexed join keys) and simplifies retraction handling.

Resource Model

The system uses a serverless, on‑demand model: no resident resources are allocated until incremental data is detected. Computation resources are auto‑scaled based on data volume, achieving higher utilization than traditional always‑on stream engines.

Performance Evaluation

Benchmarks show Hologres’s incremental engine outperforms leading international solutions in single‑table aggregation and multi‑table join scenarios. In the Nexmark Q23 benchmark, Hologres achieves 3‑6× higher CU‑throughput than a mainstream stream engine.

Real‑World Case Studies

Minute‑level real‑time aggregation screen : By creating a Dynamic Table with freshness = '1 minute' and a SQL that groups by product category, the system automatically ingests lake data and updates the aggregation every minute without ETL coding.

Logical partition for unified real‑time and offline analysis : Users specify active partitions (e.g., last 2 days) for incremental refresh and keep historical partitions static, enabling a single view for both fresh analytics and historical corrections.

High‑throughput UV calculation : Uses Roaring Bitmap and functions RB_BUILD_AGG() and RB_CARDINALITY(RB_OR_AGG(...)) to incrementally aggregate distinct user IDs, delivering fast, low‑cost UV metrics for long‑duration windows.

Retail data‑lake modernization : A large‑scale snack‑chain reduced query latency from hours to seconds, refreshed 40 billion‑row inventory tables in 20 seconds and 200 billion‑row transaction tables in 2 seconds, cutting compute cost by ~50 %.

Conclusion

Hologres Dynamic Table fills the gap between batch and stream processing, accelerating ETL from day‑level to minute‑level latency with controllable cost, simplifying real‑time data pipelines, and providing a solid foundation for AI‑driven unstructured data workflows.

data processingPerformance BenchmarkHologresDynamic Tablecloud data warehouseincremental computing
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.