Big Data 16 min read

How Paimon + Flink Enables Low‑Cost Real‑Time State Storage for Complex Streaming Jobs

This article explains how Apache Paimon can be used as a real‑time state store for Flink, detailing its low‑cost, scalable storage, lookup‑join design, table schema, bucket configuration, memory tuning, and practical use cases such as handling refund‑adjusted order tags and cumulative metrics.

DaTaobao Tech

Dec 6, 2024

How Paimon + Flink Enables Low‑Cost Real‑Time State Storage for Complex Streaming Jobs

Background

Apache Paimon is a unified streaming‑batch data lake storage format that integrates tightly with Flink, offering near‑real‑time updates (1‑5 minutes) and high‑performance upserts on primary‑key tables. It stores data on OSS/HDFS using an LSM‑tree layout, providing better real‑time update capability than other lake formats.

Why Use Paimon for Flink State

Low cost & scalability : Paimon storage costs are roughly one‑ninth of alternatives like Hologres, and it can be shared across Hive, Spark, Trino, etc., eliminating data silos.

Real‑time performance : Designed for Flink, Paimon supports high‑throughput upserts with latency as low as one minute, outperforming Hudi or Iceberg in Flink‑centric workloads.

Use Case: Excluding Refund‑Adjusted Orders

Business needs a real‑time tag that reflects the most recent successful order time for a user‑seller pair, excluding orders that were later refunded. Traditional Flink state cannot handle long‑interval joins between order and refund streams due to state size and time‑window constraints.

Solution Architecture

Store historical order timestamps in a Paimon dimension table. When a refund event arrives, perform a lookup join against the Paimon table, filter out refunded orders, and emit the latest non‑refunded order time. This removes the dependency on Flink’s internal state, allowing stateless restarts and handling arbitrarily long intervals.

Advantages

No state dependency, saving memory and enabling stateless job restarts.

Can roll back even if a refund occurs a year after the original order.

Limitations

Paimon tables have a data‑freshness limit (typically 1‑5 minutes).

Only the most recent 10 orders per user‑seller are kept; if all ten are refunded, further rollback is impossible.

Paimon Dimension Table Design

CREATE TABLE `paimon-catalog`.`paimon-db`.test_dws_pay_refund (
    test_buyer_id VARCHAR,
    test_seller_id VARCHAR,
    test_order_dict VARCHAR, -- format: ${order_id}:${pay_timestamp}_${is_refund_flag}
    PRIMARY KEY (test_buyer_id, test_seller_id) NOT ENFORCED
) WITH (
    'merge-engine' = 'deduplicate',
    'changelog-producer' = 'none',
    'bucket' = '1000',
    'bucket-key' = 'test_buyer_id,test_seller_id',
    'delete-file.thread-num' = '32'
);

Key points:

Non‑partitioned primary‑key table (buyer + seller).

Deduplication merge engine keeps the latest record per key.

Bucket count set to 1000 to keep each bucket around 2 GB, matching Flink shuffle parallelism.

Development Experience & Tuning

Bucket Number Issues

Using too few buckets (e.g., 400) caused the LookupJoin operator to stall because each TM had to load multiple large buckets. Increasing to 1000 reduced load time to ~3 minutes.

Writer Memory Problems

Heartbeat‑timeout errors occurred due to insufficient TaskManager heap memory. The writer uses three buffers: write-buffer-size (default 256 MB) for sorting. orc.write.batch-size (default 1024 rows) for ORC conversion.

One buffer per modified bucket.

Solution: increase TaskManager memory from 4 GB to 8 GB and set taskmanager.memory.managed.size=1m to avoid managed memory pressure.

Partition Expiration & Write‑Only Mode

Non‑partitioned tables cannot use automatic partition expiration. To clean up old data, switch the table to write-only='true', delete expired rows via batch DELETE, then revert to write-only='false'. Directly running DELETE in a streaming job fails because multiple writers may conflict on the same bucket.

Use Case: Cumulative Real‑Time Tags

For long‑period cumulative metrics (e.g., total orders or GMV over the last N days), traditional windowed aggregation is unsuitable due to state size limits. The solution stores daily aggregates in a Paimon table and updates them via a DAU‑driven stream, enabling fast lookups without large windows.

CREATE TABLE `paimon-catalog`.`paimon-db`.`test_dws_pay_cate1_result` (
    `test_buyer_id` BIGINT NOT NULL,
    `test_cate1_id` STRING NOT NULL, -- category level 1
    `ds` STRING NOT NULL,            -- partition date
    `test_order_cnt` BIGINT,
    `test_gmv` DOUBLE,
    PRIMARY KEY (test_buyer_id, test_cate1_id, ds) NOT ENFORCED
) WITH (
    'bucket' = '40',
    'bucket-key' = 'test_buyer_id',
    'changelog-producer' = 'lookup',
    'fields.test_order_cnt.aggregate-function' = 'sum',
    'fields.test_gmv.aggregate-function' = 'sum',
    'merge-engine' = 'aggregation',
    'partition.expiration-time' = '7d',
    'partition.timestamp-formatter' = 'yyyyMMdd',
    'partition.timestamp-pattern' = '$ds'
);

This table partitions by day, keeps only the last 7 days, and uses aggregation merge engine to pre‑sum order count and GMV.

Key Lessons

Ensure the join key fully covers the bucket key; otherwise LookupJoin suffers from long initialization and high latency.

Proper bucket sizing (≈2 GB per bucket) aligns with Flink parallelism and avoids data skew.

Stateless restarts are achievable by offloading state to Paimon, eliminating Flink’s internal state size constraints.

Conclusion

By combining Paimon with Flink, the team built a real‑time lake‑warehouse architecture that handles massive state‑intensive scenarios such as refund‑adjusted order tagging and cumulative metrics. Paimon serves as an efficient intermediate store for dimension joins, reduces memory pressure, supports stateless job restarts, and can be consumed by downstream batch engines like ODPS or Hologres, paving the way for unified data pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink State Management Streaming Apache Paimon Real-time Data Lake Lookup Join

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.