Big Data 7 min read

Introducing Apache Paimon: An Open‑Source Streaming Lakehouse Storage Engine

Apache Paimon is an open‑source streaming data lake storage system that combines LSM‑based real‑time updates, open file formats, and deep integration with Flink, Spark, and Trino to deliver high‑throughput ingestion, low‑latency queries, and unified batch‑stream processing for modern big‑data workloads.

DataFunTalk

Apr 7, 2023

Introducing Apache Paimon: An Open‑Source Streaming Lakehouse Storage Engine

On March 12, 2023, the Flink Table Store project graduated to the Apache Software Foundation incubator and was renamed Apache Paimon (incubating).

Apache Paimon is an open‑source streaming data lake storage engine that provides high‑throughput, low‑latency data ingestion, streaming subscription, and real‑time query capabilities, and integrates with Flink, Spark, Trino and other compute engines.

It uses open file formats (ORC, Parquet, Avro) on distributed file systems and adopts an LSM‑based architecture combined with columnar storage to achieve large‑scale real‑time updates.

The LSM design enables high‑performance writes (minor compaction), efficient merges, and primary‑key‑based file skipping for fast queries.

Recent versions embed Flink CDC, allowing real‑time synchronization of MySQL tables (including schema changes) to Paimon with minimal resource consumption.

Paimon’s partial‑update engine merges streams by primary key to produce wide tables, supporting both batch reads with projection push‑down and streaming reads of fully merged data.

As a unified streaming‑batch storage, Paimon supports stream‑write/stream‑read and batch‑write/batch‑read, enabling OLAP queries on both historical and fresh data and providing changelog generation for accurate downstream processing.

Three versions of Flink Table Store have been released; version 0.4 of Paimon is planned for April, with ongoing investment in real‑time, ecosystem, and data‑warehouse completeness.

The project thanks contributors from Alibaba, ByteDance, Confluent, Tongcheng Travel, Bilibili, and the Apache Flink community, and provides contact links to the website, GitHub repository, and community chat groups.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Real-time Updates LSM streaming lakehouse Apache Paimon

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.