Backend Development 10 min read

How Kafka Implements Transactions and Achieves Exactly‑Once Guarantees

This article explains how Kafka implements transactions using a two‑phase commit protocol, compares its approach with RocketMQ, details the exactly‑once semantics for streaming workloads, and walks through the coordinator roles, message flow, and practical use‑case scenarios.

JavaEdge

Sep 1, 2024

How Kafka Implements Transactions and Achieves Exactly‑Once Guarantees

Kafka Transactions vs. RocketMQ

RocketMQ transactions guarantee that a local database operation and the corresponding message are either both committed or both rolled back, and they provide a transaction check‑back mechanism to improve success rates and data consistency.

Kafka transactions guarantee that all messages produced within a transaction are either all committed or all aborted. The messages may span multiple topics and partitions. A local transaction can be combined with a Kafka transaction to emulate RocketMQ’s behavior, but Kafka does not have a transaction check‑back mechanism.

Exactly‑Once vs. At‑Least‑Once

Exactly‑Once means a message is transferred from producer to broker and then consumed by the consumer exactly once, without duplication or loss.

At‑Least‑Once means the message is never lost but may be delivered multiple times. Most MQ systems, including Kafka, can only guarantee at‑least‑once delivery unless the Exactly‑Once semantics are explicitly enabled.

Kafka Exactly‑Once Semantics

The typical use case is a real‑time stream processing pipeline where Kafka is both the source and the sink. Data is consumed from an input topic, processed (e.g., by Flink or Spark Structured Streaming), and the results are written to an output topic. The pipeline must ensure that each input record contributes to the output exactly once, even if any node in the Kafka or processing cluster fails.

Example

All order events are stored in a topic Order. A Flink job reads the events, aggregates revenue per minute, and writes the aggregated results to a topic Income. To keep the revenue numbers correct, the system must guarantee that each order event is processed exactly once, regardless of failures in the Kafka brokers or Flink task managers.

Kafka Transaction Implementation

Transaction Coordinator

The coordinator runs as part of the broker process and participates in the normal Kafka controller election, providing high availability. Kafka creates a special internal topic (by default __transaction_state) that stores transaction log entries such as "transaction begin", "prepare", "commit", and "abort". Multiple coordinators can exist simultaneously, each managing a subset of the log partitions to allow parallel transaction execution.

Transaction Flow

1. Begin : The producer sends a BeginTransaction request to the coordinator, which records a new transaction ID in the transaction log.

2. Send messages : Before each Produce request, the producer informs the coordinator of the target topic and partition; the coordinator records this mapping in the log.

3. Message storage : Uncommitted messages are written to the target partitions exactly like normal messages. The client consumer filters out records whose transaction state is not yet committed.

4. Commit / Abort : After all messages are sent, the producer issues a CommitTransaction or AbortTransaction request, triggering a two‑phase commit.

Two‑Phase Commit Details

Prepare phase : The coordinator marks the transaction as "pre‑commit" in the internal log. At this point the transaction is considered successful and will eventually be committed.

Commit phase : The coordinator writes a special "transaction end" marker to every partition that received messages for this transaction. When a consumer reads this marker, it releases the previously filtered uncommitted records for business processing.

Finally, the coordinator writes a concluding log entry indicating that the transaction has finished.

Message Handling on the Consumer Side

Because uncommitted records reside in the normal data partitions, the consumer client discards them until the corresponding commit marker appears. This design avoids a separate queue for pending messages and leverages the existing partitioning and replication mechanisms.

Summary

Kafka implements transactions using a two‑phase commit protocol backed by an internal transaction‑state topic. Uncommitted records are stored in the target partitions and filtered out by the client until a commit marker is written. This mechanism enables Kafka’s Exactly‑Once semantics, which are most useful for real‑time computation pipelines where both input and output data reside in Kafka.

Reference: https://www.confluent.io/blog/transactions-apache-kafka/

transaction Kafka two-phase commit

Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.