Optimizing Primary‑Key and Append‑Scalable Tables in Paimon with Flink
This guide explains how to optimize Paimon primary‑key and Append‑Scalable tables in Flink by adjusting sink and source parallelism, checkpoint intervals, making small‑file merges fully asynchronous, changing file formats, and applying ordering strategies to improve both write and read performance.
Paimon write‑job bottlenecks are often caused by small‑file merges. By default, Flink checkpoints wait for these merges to finish, which can lead to back‑pressure and reduced job efficiency.
Optimization tips include:
Adjust Paimon sink parallelism via the sink.parallelism SQL hint.
Modify Flink checkpoint settings: increase execution.checkpointing.interval, set execution.checkpointing.max-concurrent-checkpoints to 3, and consider business latency tolerance.
Make small‑file merges fully asynchronous so checkpoints no longer wait for merge completion.
Change table parameters (e.g., 'num-sorted-run.stop-trigger' = '2147483647', 'sort-spill-threshold' = '10', 'changelog-producer.lookup-wait' = 'false') via ALTER TABLE or SQL hints.
If OLAP queries are not needed, switch the file format to Avro and disable statistics collection with 'file.format' = 'avro', 'metadata.stats-mode' = 'none' to boost write efficiency.
For consumption jobs, adjust Paimon source parallelism using the scan.parallelism hint, and consider reading from the Read‑Optimized system table to avoid small‑file merge overhead.
Append‑Scalable tables have additional considerations:
Adjust sink parallelism similarly; monitor for data skew and set sink parallelism differently from upstream if needed.
For read jobs, increase source parallelism and sort data using Z‑order, Hilbert, or explicit order strategies to improve batch or OLAP query performance.
Example command to compact and order data:
<FLINK_HOME>/bin/flink run \
-D execution.runtime-mode=batch \
/path/to/paimon-flink-action-0.8.2.jar \
compact \
--warehouse <warehouse-path> \
--database <database-name> \
--table <table-name> \
--order_strategy <orderType> \
--order_by <col1,col2,...> \
[--partition <partition-name>] \
[--catalog_conf <paimon-catalog-conf> ...] \
[--table_conf <paimon-table-dynamic-conf> ...]Balancing write‑side and read‑side performance by tuning these parameters helps achieve efficient data ingestion and query execution in Paimon‑backed Flink pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
