Databases 8 min read

What Is Debezium? Overview, Architecture, and Features

Debezium is an open‑source distributed platform built on Apache Kafka that captures row‑level changes from databases via change data capture, providing source connectors, an optional embedded engine, and features like low‑latency streaming, snapshots, filtering, masking, and integration with various sink systems.

Architects Research Society
Architects Research Society
Architects Research Society
What Is Debezium? Overview, Architecture, and Features

Debezium is a distributed platform that turns your existing databases into event streams, allowing applications to see every row‑level change in the database and react instantly. It is built on Apache Kafka and provides Kafka Connect‑compatible connectors to monitor specific DBMSs.

Debezium records the history of data changes in Kafka logs, from which applications can consume events. This enables applications to handle events reliably and completely, even after restarts, without missing any changes.

Debezium Architecture

Typically Debezium is deployed via Apache Kafka Connect. Kafka Connect is a framework and runtime for implementing and operating connectors.

Source connectors such as Debezium ingest data into Kafka.

Sink connectors propagate data from Kafka topics to other systems.

The diagram below shows the architecture of a CDC pipeline based on Debezium:

Besides the Kafka brokers themselves, Kafka Connect runs as a separate service. Connectors for MySQL and PostgreSQL are deployed to capture changes from those databases. The connectors use client libraries to connect to the source databases, reading binlog for MySQL or logical replication stream for PostgreSQL.

By default, changes from a captured table are written to a corresponding Kafka topic. Topic names can be adjusted using Debezium’s topic routing SMTs, e.g., using different names or consolidating multiple tables into a single topic.

Once change events are in Kafka, various sink connectors can stream them to other systems such as Elasticsearch, data warehouses, analytics platforms, or caches like Infinispan. Depending on the sink, you may apply Debezium’s ExtractNewRecordState SMT to propagate only the “after” structure.

Embedded Engine

An alternative is to run Debezium as an embedded engine, where it operates as a library inside a custom Java application instead of via Kafka Connect. This is useful for consuming change events internally without deploying a full Kafka and Connect cluster, or for streaming changes to other messaging systems such as Amazon Kinesis. Example code is available in the sample repository.

Debezium Features

Debezium is a set of source connectors for Apache Kafka Connect that capture changes using change data capture (CDC) from various databases. Compared with polling or dual‑write approaches, log‑based CDC provided by Debezium:

Ensures all data changes are captured.

Generates change events with very low latency (ms range for MySQL/PostgreSQL) while avoiding CPU overhead of frequent polling.

Requires no changes to the data model (e.g., no “last_updated” column).

Can capture deletions.

Can capture old record state and metadata such as transaction IDs and the query that caused the change (depending on database capabilities).

Additional CDC capabilities include:

Snapshots: optional initial snapshot of the current database state if the connector starts after logs have been purged.

Filters: whitelist/blacklist filters to capture specific schemas, tables, or columns.

Masking: mask values in specific columns, e.g., sensitive data.

Monitoring: most connectors expose JMX metrics.

Various message transformations: e.g., routing, extracting new record state, routing events from transactional outbox tables.

For a full list of supported databases and detailed connector configuration options, refer to the connector documentation.

Chief Review: Debezium, produced by Red Hat, is a high‑quality, actively maintained open‑source project supporting many databases (MySQL, PostgreSQL, SQL Server, Cassandra, MongoDB, DB2). It follows the Kafka Connect standard and is ideal for incremental change data capture in event‑driven architectures.

Original article: https://pub.intelligentx.net/technical-architecture-cdc-capture-data-changes-debezium-introducuction

Discussion: Join the Knowledge Planet “Chief Architect Circle”, the small account “jiagoushi_pro”, or QQ group “11107777”.

Thank you for following, sharing, liking, and watching.

CDCChange Data CaptureDebeziumKafka ConnectDatabase Streaming
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.