Root Causes and Solutions for Kafka Duplicate Consumption
This article analyzes the common causes of Kafka duplicate consumption, such as uncommitted offsets due to forced thread termination, auto‑commit settings, session timeouts, rebalancing, and slow processing, and provides practical solutions including disabling auto‑commit, adjusting consumer configurations, and using new consumer groups.
Duplicate consumption in Kafka happens when a consumer processes messages but the offset is not committed, causing the same records to be read again after a restart or rebalance.
The main reasons identified are:
Forcefully killing the consumer thread, which prevents offset submission.
Using automatic offset commit and calling consumer.unsubscribe() before consumer.close(), which may leave offsets uncommitted.
Session timeout expiration (default 30 s) during long processing, leading to a rebalance before offsets are saved.
Rebalancing that reassigns partitions from the start, causing re‑delivery.
Slow consumer speed that exceeds the session interval, triggering heartbeat failures.
High concurrency that prevents timely offset commits within the session timeout.
The following code demonstrates the problematic pattern that can leave offsets uncommitted:
try {</code><code> consumer.unsubscribe();</code><code>} catch (Exception e) {</code><code>}</code><code></code><code>try {</code><code> consumer.close();</code><code>} catch (Exception e) {</code><code>}To avoid duplicate consumption, disable automatic offset commits and configure the consumer manually.
Spring configuration example:
spring.kafka.consumer.enable-auto-commit=false</code><code>spring.kafka.consumer.auto-offset-reset=latestAPI‑level configuration example:
Properties props = new Properties();</code><code>props.put("bootstrap.servers", "localhost:9092");</code><code>props.put("group.id", "test");</code><code>props.put("enable.auto.commit", "false");When enable.auto.commit is true, Kafka commits offsets at the start of each poll() call, ensuring no loss but still risking duplicates if the commit does not happen before a rebalance.
The consumer property max.poll.interval.ms defines the maximum time between successive poll() calls; exceeding it causes the group to rebalance and may result in uncommitted offsets.
For handling already duplicated data, assign a new consumer group (e.g., order_consumer_group) and set auto-offset-reset=latest so the service can restart without modifying Kafka or Zookeeper.
Note: To consume a topic from the beginning, create a fresh group ID and set auto-offset-reset=earliest.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
