Why Kafka Messages Duplicate and How to Prevent It

The article explains the main causes of duplicate Kafka messages—including producer retries, consumer offset handling, partition leader changes, and lack of idempotence—and provides practical configuration and design solutions to achieve exactly‑once delivery.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Why Kafka Messages Duplicate and How to Prevent It

Kafka is a core component in large‑scale architectures, and duplicate messages can arise from several underlying mechanisms.

Producer retry mechanism

When a producer fails to send a message due to network glitches or timeouts, it retries the send. If the broker has already accepted the message but the acknowledgment is delayed, the retry results in duplicate records.

Solution: Enable producer idempotence ( enable.idempotence=true) so that the broker uses a producerId and sequence numbers to guarantee exactly‑once semantics per partition. Also tune retries and delivery.timeout.ms appropriately.

Consumer offset handling

If a consumer crashes after processing a message but before committing the offset (or after committing but before processing completes), the same message may be re‑consumed, leading to duplication or loss depending on the commit strategy.

Solution: Adopt a "process‑then‑commit" or "commit‑then‑process" approach that matches business semantics, disable automatic commits ( enable.auto.commit=false), and manually commit offsets only after successful processing. Use transactional consumers or external persistence (e.g., write results and offsets in the same database transaction) to achieve atomicity.

Partition and replica mechanisms

Leader elections, ISR (in‑sync replica) desynchronization, or network partitions can cause writes that were thought to be committed to be replayed, producing duplicates or data loss.

Solution: Configure an appropriate replication factor and min.insync.replicas to ensure write durability. Use acks=all so that all in‑sync replicas acknowledge the write. Monitor ISR health and optimize broker and network settings to minimize frequent leader changes. Consider transactions for cross‑partition or cross‑topic consistency.

Insufficient idempotence and deduplication design

Without end‑to‑end idempotence or a globally unique message identifier, retries or replay scenarios cannot be distinguished, forcing downstream services to implement complex deduplication logic.

Solution: Attach a globally unique ID to each message (e.g., UUID, business key plus timestamp, or sequence number). Implement idempotent checks on the consumer side using external storage such as Redis, a database, or a Bloom filter. For high‑throughput workloads, design a deduplication window or event‑sourcing‑based compensation mechanism, and aim to make the business operation itself idempotent (idempotent API, idempotent write semantics).

These practices together help achieve stronger end‑to‑end duplicate control and move toward exactly‑once processing in Kafka‑based systems.

replicationDeduplicationConsumer OffsetMessage Duplicationproducer idempotence
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.