Databases 15 min read

Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

This article introduces Debezium, an open‑source low‑latency change data capture platform that streams database row changes via Kafka, explains its architecture and common scenarios such as cache invalidation and CQRS, and provides step‑by‑step Docker commands to install ZooKeeper, Kafka, MySQL and the Debezium connector.

IT Services Circle
IT Services Circle
IT Services Circle
Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

Introduction

Debezium is an open‑source, low‑latency data‑flow platform that provides Change Data Capture (CDC) capabilities. By monitoring a database, applications can receive an event for every row change that has been committed, eliminating concerns about transactions or rollbacks. Debezium offers a unified model for all change events, abstracting away the complexities of individual DBMSs.

Debezium also persists change history in logs, allowing applications to stop and restart at any time while still receiving any events missed during downtime.

Monitoring databases and reacting to changes is complex; traditional triggers are limited to certain databases and usually only affect the same database. Various databases expose different APIs, lacking a standard approach, and implementing reliable, ordered change streams without impacting the source database is challenging.

Debezium provides modules that handle these tasks. Some modules are generic and work across multiple DBMSs, while others are specialized for specific systems, offering richer functionality and better use of native features.

Basic Information

Infrastructure

Debezium leverages Kafka and Kafka Connect for persistence, reliability, and fault tolerance. Each connector deployed in Kafka Connect monitors an upstream database, captures all changes, and writes them to one or more Kafka topics (typically one topic per table). Kafka ensures multiple replicas and overall ordering (ordering is guaranteed only within a single partition of a topic). This design allows many clients to consume the same change events with minimal impact on the source database.

For applications that do not require the full fault‑tolerance and scalability of Kafka, Debezium also offers an embedded connector engine that runs inside the application process, delivering change events directly without persisting them to Kafka.

Common Use Cases

Cache Invalidation

When source data changes, cache entries can be immediately invalidated. If the cache runs in a separate process (e.g., Redis, Memcached, Infinispan), simple invalidation logic can be placed in that process, simplifying the main application.

Simplifying Monoliths

Applications often perform dual‑writes after committing database changes (e.g., updating search indexes, sending notifications). Using CDC, these post‑commit actions can be handled by independent services, improving fault tolerance and scalability.

Shared Database

When multiple applications share a database, CDC allows each to monitor changes directly without a message bus, ensuring all services stay in sync.

Data Integration

Data stored in multiple systems can be synchronized using Debezium combined with simple event‑processing logic, providing an ETL‑like solution.

Command‑Query Responsibility Segregation (CQRS)

In CQRS architectures, write and read models differ. Debezium captures write‑side changes and streams them to update read‑side views, making reliable, ordered processing feasible.

Installation

Debezium requires three independent services: ZooKeeper, Kafka, and the Debezium connector service. The official recommendation is to use Docker; the examples below use MySQL as the source database.

Start ZooKeeper

$ docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper:1.9

For Podman:

$ sudo podman pod create --name=dbz --publish "9092,3306,8083"
$ sudo podman run -it --rm --name zookeeper --pod dbz quay.io/debezium/zookeeper:1.9

Start Kafka

$ docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper quay.io/debezium/kafka:1.9

For Podman:

$ sudo podman run -it --rm --name kafka --pod dbz quay.io/debezium/kafka:1.9

Start MySQL

The container runs a pre‑configured MySQL server with an inventory database:

$ docker run -it --rm --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw quay.io/debezium/example-mysql:1.9

For Podman:

$ sudo podman run -it --rm --name mysql --pod dbz -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw quay.io/debezium/example-mysql:1.9

Start Kafka Connector

The connector service exposes a REST API for managing Debezium MySQL connectors:

$ docker run -it --rm --name connect -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link kafka:kafka --link mysql:mysql quay.io/debezium/connect:1.9

For Podman:

$ sudo podman run -it --rm --name connect --pod dbz -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses quay.io/debezium/connect:1.9

Register MySQL Connector

Registering the Debezium MySQL connector starts monitoring the MySQL binlog and emits change events for each committed transaction.

{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "184054",
    "database.server.name": "dbserver1",
    "database.include.list": "inventory",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.inventory"
  }
}

Register via curl:

$ curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "inventory-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mysql", "database.port": "3306", "database.user": "debezium", "database.password": "dbz", "database.server.id": "184054", "database.server.name": "dbserver1", "database.include.list": "inventory", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic": "dbhistory.inventory" } }'

Update Database and View Change Events

Use the watch-topic tool to observe the dbserver1.inventory.customers topic:

$ docker run -it --rm --name watcher --link zookeeper:zookeeper --link kafka:kafka quay.io/debezium/kafka:1.9 watch-topic -a -k dbserver1.inventory.customers

Execute a change in the MySQL client: mysql> UPDATE customers SET first_name='Anne Marie' WHERE id=1004;

Verify the update: mysql> SELECT * FROM customers;

Observe the change event in the watcher terminal. The event payload contains before and after structures, allowing you to see exactly what was modified.

Sample event payload:

{
  "schema": { ... },
  "payload": {
    "before": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" },
    "after": { "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "[email protected]" },
    "source": { "name": "1.9.5.Final", "name": "dbserver1", "server_id": 223344, "ts_sec": 1486501486, "file": "mysql-bin.000003", "pos": 364, "db": "inventory", "table": "customers" },
    "op": "u",
    "ts_ms": 1486501486308
  }
}
DockerKafkamysqldata integrationCDCDebezium
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.