Big Data 8 min read

Apache Paimon 0.8 Release: Deletion Vectors, File Index, Performance Boosts, and Flink/Spark Integration Enhancements

The article introduces Apache Paimon 0.8, highlighting new Deletion Vectors, a universal file index, memory and I/O optimizations, record‑level TTL, and integration improvements with Flink and Spark, while also discussing broader lake‑house performance trends and future directions.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Apache Paimon 0.8 Release: Deletion Vectors, File Index, Performance Boosts, and Flink/Spark Integration Enhancements

Earlier this week we released Apache Paimon 0.8, and this article reviews its core new features and performance improvements.

Key changes in version 0.8 include:

1. 新增 Deletion Vectors,近实时更新与极速查询<br/>2. 调整 Bucket 默认值为 -1,提升新学者的易用性<br/>3. 新增通用文件索引机制,提升 OLAP 查询性能<br/>4. 优化读写流程的内存及性能,减少 IO 访问次数<br/>5. Changelog 文件单独管理机制以延长其生命周期<br/>6. 新增基于文件系统的权限系统,管理读写权限<br/>

Read Performance Improvements

Read Optimization 1

Paimon 0.8 introduces a Deletion Vectors mode ("deletion-vectors.enabled"='true') that dramatically speeds up primary‑key table reads, enabling near‑real‑time updates and ultra‑fast queries.

The mechanism works like Select Vectors in vectorized computation: during writes a vector records which rows are deleted, allowing reads to filter them out without costly file merges.

Read Optimization 2

A universal file index is now maintained separately, improving OLAP query performance, though it is still being refined.

This index can reduce the need for downstream pipelines such as Kafka→OLAP in scenarios where lake‑table query speed meets reporting requirements.

Other Read/Write Optimizations

From the Paimon and Flink communities:

Write serialization performance improved by 10‑20%.

Append table multi‑partition writes (over 5 partitions) significantly faster.

Increased default for num-sorted-run.stop-trigger to alleviate back‑pressure.

Dynamic bucket write startup performance enhanced.

Commit optimizations:

Memory usage of Commit nodes reduced dramatically.

Removed unnecessary checks, making write‑only commits much faster.

Partition expiration performance greatly improved.

Query optimizations:

Plan generation memory consumption lowered.

Reduced namenode accesses during planning and reading, benefiting object‑storage OLAP.

Codegen cache added, boosting short‑query performance.

Serialized Table objects lower Hive query namenode hits.

First‑row merge‑engine query speed significantly increased.

Record‑Level TTL Configuration

Traditional lake tables often only support partition‑level TTL, forcing the use of partitioned tables; Paimon now allows record‑level TTL via the record-level.expire-time setting, similar to HBase.

Integration Optimizations with Flink and Spark

Lookup Join Optimization

1. Flink Lookup Join in this version uses Hash Lookup to avoid RocksDB insertion overhead;<br/>2. Flink Lookup Join introduces max_pt mode to join the latest partition data<br/>

The current dimension‑table join capabilities still lag behind some storage systems.

Spark Query Optimization

Spark 使用 COW 技术支持了 Append 表的 DELETE 与 UPDATE,Spark DELETE 也支持了所有MergeEngines 的主键表。Spark DELETE 和 UPDATE 也支持 subquery 的条件。Spark COMPACT Procedure 支持了 <span style="color: #e6c07b; line-height: 26px">where</span> 的方式。<br/>

Overall, the lake‑house ecosystem continues to evolve, and we will keep sharing production best practices and future trends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkSparkApache PaimonDeletion Vectors
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.