Big Data 19 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka’s transaction API, detailing how it enables exactly‑once processing for stream‑processing applications, the role of transaction coordinators and logs, Java API usage, performance considerations, and best‑practice guidance.

Architects Research Society
Architects Research Society
Architects Research Society
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

In a previous blog post we introduced the various messaging semantics of Apache Kafka®, including idempotent producers, transactions, and exactly‑once processing in Kafka Streams. This article continues that discussion by focusing on Kafka transactions, aiming to familiarize readers with the core concepts required to use the Kafka transaction API effectively.

Why Transactions?

Transactions in Kafka are designed for applications that follow a read‑process‑write pattern where both reads and writes come from asynchronous data streams such as Kafka topics, commonly known as stream‑processing applications.

Early generations of stream‑processing could tolerate inaccurate results, but many modern use‑cases—especially in finance—require strict exactly‑once guarantees because errors cannot be tolerated.

Formally, exactly‑once processing means that a message a is used to produce b = F(a) only if the production of b succeeds, and vice‑versa.

Using a regular Kafka producer/consumer with at‑least‑once semantics can break exactly‑once guarantees in three ways:

Internal retries may cause duplicate writes of b (handled by idempotent producers, not the focus here).

Re‑processing of input message a can lead to duplicate b writes if the application crashes after writing b but before marking a as consumed.

"Zombie" instances: when a failed instance is replaced, multiple instances may read the same input and write duplicate outputs.

Kafka’s transaction API addresses the second and third problems by making the read‑process‑write cycle atomic and isolating zombie instances.

Transactional Semantics

Transactions allow atomic writes to multiple topics and partitions. Either all messages in the transaction are successfully written, or none are. If a transaction aborts, none of its messages become visible to consumers.

An atomic read‑process‑write cycle means that a consumer offset X on partition tp0 and a produced message b on partition tp1 are committed together; otherwise, neither is committed.

Offsets are considered committed only when they are written to the internal __consumer_offsets topic. Thus, committing an offset is itself a write to a Kafka topic, and the transaction groups the offset commit with the output messages.

Kafka solves the "zombie" problem by assigning each transactional producer a unique transaction.id . The transaction coordinator tracks the epoch associated with each transaction.id ; if an older epoch appears, it is treated as a zombie and isolated.

Reading Transactional Messages

Consumers only receive messages from committed transactions. They never see messages that belong to an open or aborted transaction.

Note that this guarantee does not provide atomic reads: a consumer cannot know whether a batch of messages was produced as part of a transaction, nor can it guarantee that all partitions of a topic are part of the same transaction.

In short, Kafka guarantees that consumers see either non‑transactional messages or messages from transactions that have been committed.

Transaction API in Java

The transaction feature is server‑side and protocol‑level, so any client library that supports it can use it. A typical Java read‑process‑write application using the Kafka transaction API looks like this:

Lines 1‑5 configure the producer with a transactional.id and call initTransactions() . Lines 7‑10 configure the consumer to read only committed messages (read_committed mode). Lines 14‑21 illustrate the core loop: begin a transaction, process records, write output records, send offset commits, and finally commit the transaction.

How Transactions Work

The transaction coordinator and a special internal topic (the transaction log) are introduced in Kafka 0.11.0. Each broker runs a coordinator; the transaction log is partitioned, and a simple hash of transaction.id determines which coordinator owns a given transaction.

The coordinator keeps transaction state in memory and persists it to the transaction log, which is replicated using Kafka’s robust replication protocol.

Only the latest state of a transaction is stored in the log; the actual payload remains in the regular topic partitions.

Data Flow

Four high‑level data‑flow stages are identified:

Producer ↔ Transaction Coordinator : initTransactions registers the transaction, partitions register with the coordinator on first send, and commit/abort triggers a two‑phase commit.

Coordinator ↔ Transaction Log : the coordinator updates transaction state in memory and writes it to the log, which is replicated.

Producer → Target Partitions : after registration, the producer sends data to the actual partitions, with additional validation for transactional safety.

Coordinator ↔ Topic Partitions : during commit, the coordinator writes a transaction marker to each involved partition, which consumers in read_committed mode use to filter out aborted or open‑transaction messages.

Practical Transaction Handling

Choosing a stable transaction.id is crucial to avoid zombie instances. The same transaction.id must always be associated with the same input partitions throughout the read‑write cycle.

Performance considerations:

Transactional overhead is modest and independent of the number of messages in the transaction; the key to high throughput is to batch many messages per transaction.

Longer transaction intervals increase end‑to‑end latency because consumers wait for the transaction to commit before delivering messages.

Transactional consumers are lightweight: they only filter out aborted messages and ignore open‑transaction messages, so read_committed mode does not reduce consumer throughput.

Further Reading

For deeper details see the original Kafka KIP, the design document, and the KafkaProducer Javadocs.

Conclusion

We have covered the design goals of Kafka’s transaction API, its semantics, and how it works in practice. While the API guarantees exactly‑once processing for the read‑process‑write cycle, additional side effects in the processing stage may still require extra handling. Kafka Streams builds on this API to provide end‑to‑end exactly‑once guarantees for a wide range of stream‑processing applications.

Future posts will explore exactly‑once semantics in Kafka Streams and practical implementation tips.

Big Datastream processingTransactionsApache Kafkaexactly-onceJava API
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.