How to Solve Common Paimon Performance Issues in Flink: Small Files, OOM, and More
This article compiles frequent problems encountered when using Paimon with Flink—such as small‑file generation, write‑performance bottlenecks, OOM/GC issues, file‑deletion conflicts, dimension‑table join slowness, and snapshot expiration—and provides practical configuration and optimization solutions.
This note summarizes common issues faced while using Paimon, referencing official docs, cloud‑provider guides, and community discussions, and offers practical solutions for each.
Small‑file problems causing back‑pressure/blocking
Small files are a typical challenge for many big‑data storage and compute frameworks, including Paimon. In Flink‑Paimon, small files arise mainly from frequent checkpoint flushing and WriteBuffer spilling.
If the checkpoint interval is too short or the WriteBuffer size is too small, data is flushed to disk more often, creating excess small files. Improper bucket‑key settings can also increase small‑file count.
Checkpoint interval: a recommended range is 1–2 minutes, but many practitioners find 3–5 minutes more reasonable.
WriteBuffer size: keep the default; for large data volumes you may increase write-buffer-size or enable write-buffer-spillable to produce larger files.
Data volume: adjust the number of buckets so each bucket is around 1 GB (slightly larger is acceptable).
Key settings: choose appropriate bucket‑key and partition to avoid hot‑key skew.
Compaction parameters: generally use defaults; in production enable asynchronous compaction via three specific parameters.
'num-sorted-run.stop-trigger' = '2147483647', -- very large value to reduce write pauses
'sort-spill-threshold' = '10', -- prevent memory overflow
'changelog-producer.lookup-wait' = 'false' -- disable sync wait for asyncInsufficient write performance leading to back‑pressure
Optimizing Flink‑Paimon writes involves many parameters. Beyond small‑file compaction, consider:
Parallelism: set sink parallelism equal to the number of buckets.
Enable local merging (Local Merging) before bucket partitioning, tuning from 64 MB upward.
Choose suitable file encoding and compression formats.
Out‑of‑Memory or frequent GC
OOM errors typically appear as Caused by: java.lang.OutOfMemoryError: Java heap space or GC overhead limit exceeded. Increase the TaskManager heap memory.
Another cause is a single bucket holding too much data due to an ill‑chosen bucket key, leading to hot‑key skew and OOM. Increase bucket count and rescale existing data if needed.
File deletion conflicts detected! Give up committing.
This classic issue occurs when multiple jobs write to the same Paimon table simultaneously, causing snapshot or file conflicts during concurrent compaction and commit.
Recommended fix: enable write-only=true for both offline and streaming jobs, and run a separate job that performs only compaction.
Dimension‑table join performance problems
Primary‑key tables can be used as dimension tables for look‑ups, but high‑throughput scenarios may become bottlenecks. Common optimizations include:
(Async) delayed retries—use cautiously.
Dynamic partitioning.
Cache settings: lookup.cache='auto' (partial cache) or lookup.cache='full' (full cache). Partial cache is auto‑selected when the join table is bucketed and the join key matches the primary key; otherwise full cache is used, which may cause cold‑start overhead.
Bucket Shuffle (available on many cloud platforms) hashes by join key, allowing each bucket’s data to be cached separately, reducing memory usage and cache eviction.
FileNotFoundException
When reading a Paimon table, snapshots and changelogs expire after one hour by default. If downstream jobs lag or experience a pause longer than an hour, this error occurs. Increase snapshot.time-retained to extend the retention period.
Write‑query performance trade‑off
Paimon (and similar frameworks) offers two modes: Merge‑On‑Read (MOR) for fast writes but slower queries, and Copy‑On‑Write (COW) for fast queries but slower writes.
Starting with version 0.8, Deletion Vectors are introduced, allowing MOR to mark deleted rows at write time, achieving fast updates without degrading query performance. Consider using Deletion Vectors when both write and query performance are critical.
These are the main issues collected; real‑world production environments may present additional challenges that require further investigation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
