Kafka Idempotent and Transactional Messaging Overview
This article explains how Kafka implements idempotent producers and transactional messaging to achieve exactly‑once semantics, detailing the producer session identifiers, sequence numbers, broker checks, two‑phase commit workflow, consumer isolation levels, and the limitations of atomic reads.
Kafka introduces idempotent messages to solve duplicate and out‑of‑order issues caused by retries, ensuring that a producer’s writes to a partition within a single session are idempotent and enabling Exactly‑Once semantics.
The implementation is straightforward: each producer obtains a globally unique pid when it starts, and every batch of messages includes an incrementing sequence number. The broker maintains a (pid, seq) mapping in memory and checks the incoming sequence number to classify messages as normal, duplicate, or lost.
new_seq = old_seq + 1: normal message;</code><code>new_seq <= old_seq: duplicate message;</code><code>new_seq > old_seq + 1: message loss;If a producer receives a clear loss acknowledgment or times out without an ack, it retries the send.
Transactional messaging addresses the need for atomic writes of multiple messages in stream‑processing scenarios, involving the producer, transaction coordinator, broker, group coordinator, and consumer.
Producer : Assigned a fixed TransactionalId that persists across restarts, uses an epoch to identify each “rebirth,” follows idempotent behavior, and adds transaction id and epoch to record batches.
Transaction Coordinator : Implements a two‑phase commit using a special transaction topic for logs, balances load by hashing TransactionalId to a broker, and coordinates commit/abort across participants.
Broker : Writes control messages (commit/abort) interleaved with normal messages, advances the high‑water mark, and may also write transactional offset messages.
Group Coordinator : Records transactional consumer offsets in the offset log; these offsets become visible only after the transaction commits.
Consumer : Filters uncommitted and control messages, making them invisible to the user. Two approaches exist:
Consumer‑side caching (set isolation.level=read_uncommitted) where the consumer buffers messages until a commit or abort is received.
Broker‑side filtering (set isolation.level=read_committed) where only committed messages are delivered.
The transaction flow includes:
FindCoordinatorRequest – locate the transaction coordinator.
InitPidRequest – request a pid and epoch.
Start transaction – local producer state change.
Register partitions (AddPartitionsToTxnRequest) – inform the coordinator of involved partitions.
ProduceRequest – send messages with tid, pid, epoch, and seq.
AddOffsetCommitsToTxnRequest – send consumer offsets to the coordinator.
TxnOffsetCommitRequest – write offsets to the offset log (visible after commit).
EndTxnRequest – coordinator writes PREPARE_COMMIT or PREPARE_ABORT, sends WriteTxnMarkerRequest to all involved brokers, and finally writes COMMITTED or ABORTED markers.
After all commit/abort markers are persisted, the transaction is considered complete and its state can be cleaned up.
While Kafka’s transactional messaging guarantees atomic writes across multiple topics, it does not guarantee atomic reads; failures can lead to partial visibility of transaction data, and stream applications may produce nondeterministic results.
In summary, Kafka provides powerful exactly‑once delivery mechanisms through idempotent producers and transactional messaging, but developers must be aware of its read‑side limitations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
