Big Data 17 min read

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

DataFunTalk
DataFunTalk
DataFunTalk
Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

On December 9, 2023, Flink Forward Asia (FFA) concluded in Beijing with more than 70 talks and over 30 technical presentations from leading companies, underscoring the event’s strong industry pull as the conference returned to an in‑person format.

1. Two Major Flink Releases Focus on Deepening Scenarios and Refinement The Flink community highlighted the rapid evolution of stream processing over the past decade and announced the release of Flink 1.17 and 1.18, maintaining a bi‑annual release cadence. Improvements include extensive optimizations for Flink SQL (e.g., the new Plan Advice feature), enhanced checkpointing with incremental checkpoints, and significant performance gains for batch workloads—Flink 1.18’s batch mode on the TPC‑DS 10 TB dataset is 54 % faster than version 1.16.

Deployments have become more cloud‑native: the community added API‑driven elastic scaling without restarts and introduced a Kubernetes‑based autoscaling mechanism that dynamically adjusts resources based on load and latency.

Use‑case demonstrations, such as Cao Cao Travel’s real‑time data‑warehouse built on Flink, showed up to 60 % improvement in passenger subsidy efficiency and a ten‑fold increase in gross profit.

2. Apache Paimon: Driving the Next Wave of Streaming Lakehouse The conference announced the graduation of Flink Table Store to the Apache Software Foundation as Apache Paimon, a streaming‑first lake storage format that supports high‑throughput, low‑latency ingestion, streaming subscriptions, and real‑time queries. Paimon aims to bridge Flink with lakehouse architectures, offering native support for Lakehouse APIs, JDBC drivers, and seamless integration with BI tools.

Community members described the challenges of adapting existing lake formats (Iceberg, Hudi) for streaming workloads and explained how Flink + Paimon provides a native LSM‑based design that reduces latency to 1‑5 minutes and supports both Flink and Spark.

Industrial adopters such as Tongcheng Travel and AutoHome reported dramatic gains: Tongcheng Travel achieved a 30 % increase in ODS sync efficiency, three‑fold write speed improvements, and up to 7‑fold query speedups, while AutoHome built a real‑time intelligent pipeline using Flink CDC and Paimon.

3. Flink CDC 3.0 Real‑Time Data Integration Framework Flink CDC, a set of Flink‑based connectors for change‑data‑capture, was upgraded to version 3.0. New connectors for IBM DB2 and Vitess were added, along with advanced features such as dynamic table addition, auto‑scaling, asynchronous sharding, and at‑least‑once semantics. The framework now supports YAML + CLI APIs for simplified development and plans to cover MySQL, StarRocks, Doris, Paimon, Kafka, MongoDB, and more.

The release was accompanied by Alibaba’s donation of Flink CDC to the Apache Software Foundation.

4. Global Open‑Source Ecosystem and Community Growth The Flink community highlighted its rapid international expansion, with over 750 technical articles, 111 participating companies, and 351 developers in China alone, contributing to more than 2.35 million reads. Flink was recognized with the SIGMOD System Award 2023 for its impact on real‑time big‑data processing.

Overall, the conference demonstrated Flink’s continued leadership in streaming and batch‑unified processing, the emergence of lakehouse solutions like Apache Paimon, and the maturation of real‑time data integration through Flink CDC 3.0.

Big Datareal-time analyticsApache FlinkStreamingFlink CDCApache Paimon
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.