Understanding Apache Pulsar Transactions: Core Concepts and Workflow
Apache Pulsar 2.8.0 introduces transaction support, featuring a Transaction Coordinator, Transaction Buffer, Transaction Log, Transaction ID, and Pending Acknowledge State, with a detailed workflow that ensures exactly‑once semantics for stream processing, contrasting its design with Kafka’s approach.
Introduction
Apache Pulsar version 2.8.0 adds native transaction capabilities. Unlike RocketMQ’s two‑phase commit, Pulsar’s design resembles Kafka’s approach and targets exactly‑once semantics for stream‑processing use cases such as Pulsar Functions, aligning with Pulsar’s event‑streaming goals.
Core Concepts
1. Transaction Coordinator (TC)
The TC is a manager ledger that coordinates all transaction‑related requests (commit, abort, etc.). Each TC maps to a partitioned topic named transaction_coordinator_assign, which is used to locate the owning broker via Pulsar’s lookup mechanism. The partitioned topic also enables scaling of TC processing.
2. Transaction Buffer
Initially, Pulsar planned to store transactional messages in the buffer (TB) and make them visible to consumers only after commit. In the 2.8.0 implementation, messages are written directly to the real topic, and consumers filter them based on transaction state, eliminating the double‑write overhead and preventing premature consumption.
3. Transaction Log
The Transaction Log is another manager ledger that persists transaction metadata (state transitions). It supports append (record a new state) and reply (recover state from a given position). The log stores only metadata such as OPEN, COMMITTING, ABORTING, COMMITTED, ABORTED, and ERROR.
4. Transaction ID (TxnID)
A 128‑bit identifier composed of two 64‑bit parts: mostSigBits (TC partition ID) and leastSigBits (log sequence number). The TC generates TxnID, guaranteeing global uniqueness.
5. Pending Acknowledge State
For a transactional message composed of many individual records, Pulsar tracks the set of pending acknowledgments. The state ensures that if a transaction times out, all pending acks are aborted, and while the transaction is active, other transactions cannot interfere with those messages. The state is persisted via the Cursor Log, enabling recovery after broker failures.
Transaction Workflow
1. Locate TC
The client uses the transaction_coordinator_assign topic to find the owner broker and creates a TC, which generates a unique TxnID.
2. Open Transaction
The TC records an OPEN entry in the Transaction Log and returns the TxnID to the client.
3. Send Transactional Messages
The client sends messages (optionally batched) that include the TxnID. Brokers write these messages to the partitioned topic and, if batching is enabled, store them in the Transaction Buffer for later commit.
4. Acknowledge Messages
Clients send acknowledgment requests containing the TxnID. Brokers verify the TxnID against the pending‑ack state and forward the request to the appropriate component.
5. Complete Transaction
After all operations succeed, the TC writes COMMITTING or ABORTING to the Transaction Log, then performs the final commit or abort. The final state (COMMITTED or ABORTED) is persisted, concluding the transaction.
Comparison with Kafka Transactions
Both Pulsar and Kafka use a coordinator and a log to track transaction state. Differences include: Kafka stores uncommitted messages on the broker, while Pulsar relies on client‑side tracking with a Transaction Timeout; Kafka processes transactions sequentially, causing blocking, whereas Pulsar allows per‑message acknowledgments, avoiding global blocking.
Conclusion
Pulsar’s transaction framework provides end‑to‑end exactly‑once guarantees for stream processing, leveraging a coordinator, buffer, log, and pending‑ack mechanisms. Its design improves over earlier implementations by reducing write amplification and supporting fine‑grained acknowledgments, while offering a workflow comparable to Kafka’s model.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
