Kafka Transactions: Implementation Details and End‑to‑End Process
This article explains how Apache Kafka implements transactional messaging, covering the atomic write guarantees across partitions, the role of TransactionCoordinator, transaction state management, producer and consumer handling, code examples, and the end‑to‑end workflow for exactly‑once semantics.
Overview
Kafka transactions provide atomic writes across multiple partitions, enabling exactly‑once semantics (EOS) that go beyond the per‑partition guarantees of the idempotent producer. The implementation is based on a TransactionCoordinator, transaction logs, and a series of state transitions.
Exactly‑Once Guarantees
Idempotent Producer – exactly‑once, in‑order delivery per partition.
Transactions – atomic writes across partitions.
Exactly‑once stream processing – end‑to‑end EOS for consume‑process‑write pipelines.
TransactionCoordinator
The TransactionCoordinator (TC) acts like a 2PC coordinator. It receives requests such as FindCoordinator, InitProducerId, AddPartitionsToTxn, AddOffsetsToTxn, and EndTxn. TC stores transaction metadata in the internal topic __transaction_state, persists state changes, and sends Transaction Marker messages to partition leaders.
Transaction Log (__transaction_state)
All transaction metadata (transactionalId, producerId, epoch, status, involved partitions, timestamps) is serialized into __transaction_state. This compacted topic allows a new TC to recover the full state after a leader change.
State Machine
Transaction states include Empty, Ongoing, PrepareCommit, PrepareAbort, CompleteCommit, CompleteAbort, Dead, and PrepareEpochFence. Normal flow:
Empty → Ongoing → PrepareCommit → CompleteCommit → Empty(or abort path).
Producer API Workflow
Configure transactional.id and call initTransactions() to obtain a PID.
Start a transaction with beginTransaction().
Send records; the producer adds partitions to the transaction via AddPartitionsToTxn.
Optionally call sendOffsetsToTransaction() for consume‑process‑produce scenarios.
Commit or abort with commitTransaction() / abortTransaction(), which triggers EndTxn and the TC’s marker writes.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "test-transactional");
KafkaProducer producer = new KafkaProducer<>(props);
producer.initTransactions();
producer.beginTransaction();
producer.send(new ProducerRecord(topic, "0", msg));
producer.commitTransaction();Consumer Handling
Consumers use the isolation.level setting. With read_committed, the broker only returns records up to the Last Stable Offset (LSO), which is the highest offset where all prior transactional records are known to be committed or aborted. Abort markers are stored in per‑segment .txnindex files, allowing the broker to filter aborted data without client‑side buffering.
Fencing Mechanisms
Both TransactionCoordinator and producers employ fencing via epochs. A newer coordinator or producer with a higher epoch invalidates older instances, causing ProducerFencedException on stale producers.
Snapshots and Recovery
Each partition periodically writes a PID snapshot ( .snapshot) containing the latest producerId, epoch, and sequence numbers. After a broker restart, the snapshot plus the log enables fast recovery of idempotent state.
Failure Scenarios
Producer retries on beginTransaction or commitTransaction failures.
Coordinator failures are handled by replaying __transaction_state and completing pending PrepareCommit / PrepareAbort phases.
Transaction time‑outs trigger automatic aborts.
Conclusion
The combination of idempotent producers, a dedicated TransactionCoordinator, transaction logs, LSO, and fencing provides Kafka with robust exactly‑once guarantees across partitions, enabling reliable stream processing frameworks such as Kafka Streams and Flink.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
