How to Eliminate Duplicate and Missed Messages in Kafka Consumers
This article explains Kafka's push/pull consumption models, the impact of enable.auto.commit and auto.commit.interval.ms on offset handling, and presents practical configurations and code patterns plus MySQL and Redis based deduplication techniques to prevent both duplicate and missing message processing.
Understanding Kafka Consumption Modes
Message queues are complex middleware built on two RPC calls and a persistence step. In Kafka, messages are delivered from the broker to the consumer using a pull model, where the consumer actively polls the broker, unlike the push model where the broker pushes messages to the consumer.
Root Causes of Duplicate and Missed Consumption
Two key consumer parameters control offset commits: enable.auto.commit (default true) determines whether the consumer automatically commits offsets, and auto.commit.interval.ms (default 5000 ms) sets the commit frequency when auto‑commit is enabled.
If enable.auto.commit=true, offsets are committed lazily. When a consumer crashes after processing business logic but before the automatic commit occurs, it restarts and reprocesses the same messages, causing duplicate consumption. Conversely, if the consumer crashes before processing the business logic but after an offset has been committed, the message is considered consumed and will never be re‑processed, resulting in missed consumption.
Even when enable.auto.commit=false (manual commit), the same problems appear if the code commits offsets at the wrong point in the processing flow.
General Mitigation Strategy
The recommended approach is to disable automatic commits, process the business logic first, and then manually commit the offset. This ordering eliminates both missing and duplicate consumption regardless of whether auto‑commit is enabled.
Preventing Duplicate Consumption – Idempotency
Duplicate consumption is essentially an idempotency issue. Two common solutions are:
MySQL Unique Index : Insert a record with a globally unique order_id. If the order_id already exists, the insert fails, preventing duplicate processing. This method is simple and low‑cost but adds random I/O to InnoDB because unique indexes cannot use the Change Buffer.
Redis Key with Expiration : Store a unique identifier as a Redis key with a short TTL. Subsequent attempts to insert the same key are rejected, providing fast deduplication. It offloads work from the database but requires a fallback plan if Redis becomes unavailable.
Choosing the Right Approach
MySQL unique indexes are reliable but may increase disk I/O; Redis offers high performance but introduces a dependency that must be handled in a failure scenario. The optimal solution depends on the specific workload, latency requirements, and tolerance for infrastructure failures.
By configuring Kafka offsets correctly and applying an appropriate deduplication mechanism, developers can achieve reliable, exactly‑once‑style processing for their Kafka consumers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Tony
Former senior tech manager at Meituan, ex‑tech director at New Oriental, with experience at JD.com and Qunar; specializes in Java interview coaching and regularly shares hardcore technical content. Runs a video channel of the same name.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
