Big Data 13 min read

Incremental Computation in Big Data: Flink Materialized Table and Paimon

The article explains how Flink 1.20’s Materialized Table combined with Paimon’s changelog storage enables incremental computation that unifies batch and streaming workloads, delivering minute‑level latency at lower cost, illustrated by a materialized‑table example while noting current streaming‑only support and future batch extensions.

DaTaobao Tech

Dec 18, 2024

Incremental Computation in Big Data: Flink Materialized Table and Paimon

Since the "three‑horse‑carriage" era, big‑data processing has evolved from offline Hive+Spark to real‑time Flink, yet low‑cost near‑real‑time processing remains challenging.

Incremental computation (also known as Incremental View Maintenance in databases) is gaining attention to unify batch and stream processing. Flink 1.20 introduces Materialized Table (MT) to combine both modes, while Paimon provides Changelog storage.

The article first explains incremental computation concepts, then presents a Flink‑Paimon case study, highlighting current capabilities and limitations.

Key reasons for the renewed interest:

Desire for a unified batch‑stream architecture; differing computation models hinder code reuse.

Trade‑off between cost and data freshness; pure streaming incurs high resource cost, while batch has latency.

Incremental computation can balance cost and latency by adjusting computation frequency.

Flink’s model is based on Changelog‑driven incremental computation, which records state changes. In practice, state TTL assumptions and rollback operations affect accuracy.

Implementation differences:

Databases capture changes via internal mechanisms (e.g., PostgreSQL AFTER triggers) and rewrite queries for incremental results.

Big‑data engines use a generic Changelog model (e.g., Paimon’s Changelog Producer) and let the engine handle incremental logic.

A concrete Flink‑MT + Paimon example creates a catalog, source table, and a materialized table that aggregates order counts per logistics company. The SQL statements are:

SET 'sql-client.execution.result-mode' = 'tableau';
SET 'table.exec.sink.upsert-materialize' = 'NONE';
SET 'execution.runtime-mode' = 'streaming';
CREATE CATALOG paimon WITH ('type'='paimon','warehouse'='file:/tmp/paimon');
CREATE TABLE tms_source (
  order_id STRING PRIMARY KEY KEY NOT ENFORCED,
  tms_company STRING NOT NULL
) WITH (
  'connector'='paimon',
  'path'='file:/tmp/paimon/default.db/tms_source',
  'changelog-producer'='lookup',
  'scan.remove-normalize'='true'
);
CREATE MATERIALIZED TABLE continues_tms_res (
  CONSTRAINT `pk_tms_company` PRIMARY KEY (tms_company) NOT ENFORCED
) WITH (
  'format'='debezium-json',
  'scan.remove-normalize'='true',
  'changelog-producer'='lookup',
  'merge-engine'='aggregation',
  'fields.order_cnt.aggregate-function'='sum'
) FRESHNESS = INTERVAL '30' SECOND
AS SELECT tms_company, 1 AS order_cnt FROM tms_source;
INSERT INTO tms_source VALUES ('001','ZhongTong');
INSERT INTO tms_source VALUES ('002','ZhongTong');
INSERT INTO tms_source VALUES ('001','YuanTong');

Querying the source and materialized tables shows Changelog records (+I, -U, +U) and the aggregated order counts, demonstrating incremental updates without full recomputation.

Although the current Flink batch mode does not support Changelog processing, the streaming mode proves that incremental computation can achieve low‑cost near‑real‑time analytics. Future work may extend this to batch scheduling.

Conclusion: Incremental computation bridges the gap between batch and streaming, offering minute‑level latency at reasonable cost, but pure real‑time scenarios still require dedicated architectures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink stream processing Incremental Computation Materialized Table Paimon

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.