Big Data 9 min read

Apache Paimon Graduates to Top‑Level Project – Milestones, Core Capabilities, and Community Highlights

Apache Paimon, originally launched as Flink Table Store, has graduated to an Apache Top‑Level Project after a year of incubation, showcasing real‑time lakehouse capabilities, extensive ecosystem integration, and strong adoption by major enterprises, marking a significant milestone for streaming and batch data processing.

DataFunTalk
DataFunTalk
DataFunTalk
Apache Paimon Graduates to Top‑Level Project – Milestones, Core Capabilities, and Community Highlights

On April 16, 2024, the Apache Software Foundation announced that Apache Paimon has officially graduated to become an Apache Top‑Level Project (TLP), recognizing its breakthroughs in real‑time data lake and stream‑batch processing technologies.

Originally named Flink Table Store, the project started in January 2022 within the Apache Flink community to combine Flink’s streaming compute power with the advantages of the lakehouse architecture, enabling truly real‑time data movement in the lake and providing a unified streaming‑offline development experience.

On March 12, 2023, the project passed the Apache incubation vote and was renamed Apache Paimon, focusing on streaming‑oriented real‑time data lake storage. Mentors and the incubation committee guided its development throughout the year.

On March 20, 2024, the Apache Board approved the graduation resolution, confirming Apache Paimon as a Top‑Level Project after a successful incubation period that significantly raised community contributions and visibility.

During incubation, Paimon released four major versions and has been adopted by enterprises such as Alibaba, ByteDance, Tongcheng Travel, Ant Group, China Unicom, NetEase, Zhongyuan Bank, Autohome, Ping An Securities, and Ximalaya, helping them build real‑time data lakes, improve CDC ingestion, and enhance data timeliness.

Core Capabilities

Apache Paimon is a lake‑format storage that integrates Flink and Spark for unified stream‑batch processing. Its innovative combination of lake format and LSM technology brings real‑time updates and full streaming processing to the data lake.

Key features include:

● Enhanced Real‑time Ingestion : Provides tools that automatically sync schema changes and stream changes from databases such as MySQL into the lake with high efficiency and low latency even at massive scale.

● Unified Stream‑Batch Processing : Leverages Flink for streaming and Spark for batch, delivering consistent data semantics across both workloads and reducing operational cost.

● Broad Ecosystem Integration : Integrates with major big‑data engines including Flink, Spark, Hive, Trino, Presto, StarRocks, Doris, etc., enabling seamless compute‑storage boundaries.

● Lakehouse Storage Innovations : Introduces Deletion Vectors and indexing to boost query performance while supporting streaming, batch, and OLAP scenarios with minute‑level latency.

The graduation confirms that Apache Paimon meets Apache’s rigorous standards for community governance, code quality, documentation, and user adoption, paving the way for broader global adoption of real‑time data lake technology.

Graduation Messages

Community members and mentors expressed heartfelt congratulations, highlighting Paimon’s rapid growth, innovative features, and its role in simplifying lakehouse development for enterprises such as Alibaba, Ant Group, and others.

Several industry leaders noted that Paimon’s simplicity, unified stream‑batch model, and low‑latency updates have already become critical components in their data‑lake architectures and will continue to drive innovation.

Additional Content

Alibaba Cloud offers a Flink‑plus‑Paimon cloud solution for building high‑efficiency, low‑latency streaming data warehouses, reducing data change propagation delay from hours to minutes and simplifying ETL pipelines with Flink SQL.

For more details, visit the Alibaba Cloud documentation link provided in the original announcement.

Big DataStreamingopen-sourceLakehouseApache Paimonreal-time data lake
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.