Big Data 10 min read

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

This article reviews the Pravega project and its Flink connector, covering Pravega's design for large‑scale streaming, the connector's evolution and exact‑once semantics, Flink 1.11 integration challenges, checkpoint mechanisms, and future plans such as schema‑registry and new Flink features.

DataFunTalk
DataFunTalk
DataFunTalk
Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

Pravega is an open‑source, CNCF sandbox project created in 2016 for large‑scale data‑stream scenarios, offering high‑performance, elastic, tiered storage that addresses the limitations of traditional message queues.

The Pravega Flink connector, first released as an independent GitHub project in 2017, initially provided basic source and sink functions and later added exact‑once semantics through a two‑phase commit mechanism, eventually supporting batch reads and Table API integration, and focusing on Flink 1.11 features such as FLIP‑27 and FLIP‑95.

Unlike Kafka, Pravega implements its own checkpoint mechanism: the Job master triggers checkpoints via the ExternallyInducedSource interface, Pravega uses a StateSynchronizer to broadcast checkpoint events to all readers, and completed checkpoints are stored in the Job master state, avoiding tight coupling with Flink internals.

The integration of Flink 1.11 presented challenges, exemplified by issue FLINK‑18641, which required debugging of the CheckpointCoordinator's mailbox model; the experience highlighted the importance of searching mailing lists, providing detailed logs, and collaborating with the community.

Future work includes integrating the Pravega schema‑registry to simplify Table API usage, adopting upcoming Flink features (FLIP‑143, FLIP‑129), and supporting a Docker‑based test framework, while also promoting the Pravega maker competition to encourage community involvement.

Big DataFlinkstream processingConnectorCheckpointTable APIPravega
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.