Big Data 17 min read

Performance Optimization of Apache Paimon in Dolphin OLAP Engine

The article details how Apache Paimon, integrated as an external table format in Alibaba’s Dolphin OLAP engine, achieves millisecond‑level query latency and up to 10k QPS through ORC push‑down, manifest conversion, caching, concurrency, and encoding optimizations, outperforming StarRocks and Hologres.

Alimama Tech

Apr 10, 2025

Performance Optimization of Apache Paimon in Dolphin OLAP Engine

The article introduces Apache Paimon, a lake‑format storage engine incubated from the Flink community, and its integration with Alibaba’s Dolphin multi‑model AI‑enhanced database for OLAP workloads.

It explains why Paimon is chosen as the storage protocol: support for streaming writes, primary‑key updates, compute‑storage separation, open source, and active community.

Dolphin uses Paimon as an external table format, enabling batch‑stream convergence and allowing most queries to achieve millisecond‑level latency with up to 10k QPS.

Performance Test Report

Cold‑query benchmarks show Dolphin‑Paimon outperforming StarRocks and Hologres by 2× to over 50× in most scenarios. High‑QPS tests on a 100‑node cluster (16 cores, 64 GB, Pangu HDD) demonstrate the ability to handle large‑scale point‑lookup workloads.

Optimization Paths

1. Enable ORC push‑down – Activating ORC filter push‑down and selected‑row reading yields dozens of times performance gains. The relevant code changes are:

package org.apache.paimon.format.orc;<br/>public class OrcReaderFactory implements FormatReaderFactory {<br/>  //...<br/>  private static RecordReader createRecordReader(<br/>              org.apache.hadoop.conf.Configuration conf,<br/>              TypeDescription schema,<br/>              List<OrcFilters.Predicate> conjunctPredicates,<br/>              FileIO fileIO,<br/>              org.apache.paimon.fs.Path path,<br/>              long splitStart,<br/>              long splitLength){<br/>    // ...<br/>      if (!conjunctPredicates.isEmpty()) {<br/>          // TODO fix it , if open this option,future deletion vectors would not work,<br/>          //  cased by getRowNumber would be changed .<br/>          options.useSelected(OrcConf.READER_USE_SELECTED.getBoolean(conf));<br/>          options.allowSARGToFilter(OrcConf.ALLOW_SARG_TO_FILTER.getBoolean(conf));<br/>      }<br/>   // ...<br/>  }<br/>  //...<br/>}

Configuration parameters to enable the push‑down:

'orc.filter.use.selected' = 'true',<br/>'orc.sarg.to.filter' = 'true'<br/># GitHub version parameters<br/>'orc.reader.filter.use.selected'='true',<br/>'orc.reader.sarg.to.filter'='true'

2. Convert manifest to ORC format – The default Avro manifest requires full in‑memory loading. Switching to ORC reduces metadata size and I/O. Example manifest entry:

{<br/>  "_VERSION": 2,<br/>  "_KIND": 0,<br/>  "_PARTITION": [...],<br/>  "_BUCKET": 47,<br/>  "_TOTAL_BUCKETS": 100,<br/>  "_FILE": {<br/>    "_FILE_NAME": "data-...-0.orc",<br/>    "_FILE_SIZE": 3088,<br/>    "_ROW_COUNT": 1,<br/>    ...<br/>  }<br/>}

Reading logic based on bucket, partition, and filter can cause millions of loop iterations. By applying bucket push‑down (PR #4497) the metadata read volume drops by 10,000×.

3. Reduce I/O operations – Cache schema and snapshot files (few KB) to avoid frequent reads, and enable manifest caching with options such as:

Options options = new Options();<br/>options.set(WAREHOUSE, uri.toString());<br/>options.set("cache-enabled","true");<br/>options.set("cache.expiration-interval","1 min");<br/>options.set("cache.manifest.small-file-memory","10m");<br/>options.set("cache.manifest.small-file-threshold","3mb");<br/>CatalogContext context = CatalogContext.create(options,...);

Be cautious of JVM GC impact when caching.

4. Increase concurrency – Execution plans generate orthogonal sub‑plans that can be read in parallel, improving throughput by over 10× for many ORC files.

5. Avoid redundant encoding/decoding – Re‑encoding of strings during setString, length calculation, and write operations was eliminated by introducing a static byte buffer and a custom SpeedGPDWritable implementation.

These low‑level optimizations together bring Dolphin‑Paimon to millisecond‑level latency with QPS >10k, and the system now serves thousands of tables at PB scale.

Business Impact and Future Work

Current production metrics: >1000 QPS, p99 latency ~100 ms, >1000 tables, PB‑scale data. Future plans include a second‑level query solution for massive log aggregation and a 100k QPS, <5 ms latency implementation. Integration with ClickHouse is also in progress.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization Metadata OLAP Paimon Dolphin

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.