Big Data 15 min read

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

This article summarizes the Pravega Schema Registry project, its integration with Flink's Catalog API, the addition of Debezium CDC support, and the related implementation challenges, providing detailed DDL examples, code snippets, and architectural diagrams for building real‑time data pipelines.

DataFunTalk
DataFunTalk
DataFunTalk
Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

The article is based on a presentation by Dell Technologies software engineer Zhou Yumin at Flink Forward Asia 2021, covering the Pravega Flink Connector Table API advanced features.

1. Pravega Schema Registry – Introduces the Schema Registry project, its motivation for centralized schema management in Pravega streams, support for multiple serialization formats (Avro, Protobuf, JSON, etc.), and its implementation on top of Pravega Key‑Value Tables.

2. Catalog API Integration – Describes how the Schema Registry enables a Flink Catalog for Pravega, allowing users to create tables via SQL DDL. Example DDL to create a catalog:

CREATE CATALOG pravega_catalog WITH(
  'type' = 'pravega',
  'default-database' = 'scope1',
  'controller-uri' = 'tcp://localhost:9090',
  'schema-registry-uri' = 'http://localhost:9092'
);

Implementation challenges include reusing Flink's internal RowData conversion code, supporting both Avro and JSON formats, and handling the five‑byte header added by the Schema Registry.

3. Debezium Support – Explains CDC concepts and Debezium deployment options, then details how Debezium events are written to Pravega streams, both in normal and transactional modes. It also shows how to configure Debezium server to sink data into Pravega.

DDL for a Debezium source table with metadata:

CREATE TABLE debezium_source (
  id INT NOT NULL,
  origin_ts TIMESTAMP(3) METADATA FROM 'from_format.ingestion-timestamp' VIRTUAL,
  origin_table STRING METADATA FROM 'from_format.source.table' VIRTUAL,
  event_pointer BYTES METADATA VIRTUAL
);

Additional code snippet for a custom deserialization method used in the connector:

default void deserialize(byte[] message, Collector
out) throws IOException {

4. Community Whitepaper – Announces a joint whitepaper between the Pravega and Flink communities, providing a complete open‑source solution for real‑time database synchronization using Pravega as the storage layer and Flink for downstream processing.

The article includes several architecture diagrams (images) illustrating the system layout and data flow.

FlinkStreamingCDCDebeziumSchema RegistryCatalog APIPravega
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.