Kafka Producer Idempotency: PID, Sequence Numbers, and Broker Deduplication
Kafka ensures that a producer’s repeated message sends, caused by network glitches or broker failures, result in only one persisted record per partition by using a unique Producer ID, monotonically increasing sequence numbers, and broker-side tracking of the latest committed sequence for each PID‑partition pair.
Introduction
Hello, I am mikechen. Kafka is an essential middleware for large‑scale architectures, and this article explains Kafka producer idempotency.
What Is Producer Idempotency?
Idempotency means that an operation yields the same result no matter how many times it is executed. For critical business systems such as payment, order, or inventory, duplicate writes are unacceptable; idempotency guarantees that a message is written only once.
Kafka Producer Idempotency
Kafka producer idempotency ensures that, even if a message is retried due to network jitter or temporary broker failures, the broker persists only a single copy of that message. In other words, sending the same message multiple times has the same effect as sending it once.
Key Components
Producer ID (PID) : Each producer instance obtains a unique identifier from the broker when it starts. This PID remains constant for the lifetime of the producer session, even after restarts, and provides the global uniqueness needed for precise deduplication.
Sequence Number : For each partition, the producer maintains a monotonically increasing counter. Together with the PID, the sequence number uniquely identifies a message. The first message for a PID typically starts at 0, and the counter increments by 1 for each subsequent message, preserving the order of messages from the same producer.
Broker‑Side Storage : For every <PID, Partition> pair, the broker keeps the latest successfully committed sequence number in memory (or more persistent storage, depending on the implementation).
Implementation Principle
The idempotency mechanism relies on the coordinated work of the three components above. The overall flow is:
Assign a unique PID to each producer instance that enables idempotency.
Attach a monotonically increasing sequence number to each message to preserve order.
The broker maintains, for each <PID, Partition>, the most recent committed sequence number.
When a message arrives, the broker compares its PID and sequence number with the stored value; if it is a duplicate, the broker discards it.
The broker also checks sequence continuity to prevent out‑of‑order writes and to detect possible message loss.
Through this mechanism, even if a producer retries sending the same message because of network issues or other failures, the broker can recognize the duplicate and ensure that each message is written only once to the target partition.
Illustrations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
