Big Data 7 min read

Root Causes and Solutions for Kafka Duplicate Consumption

This article analyzes the common causes of Kafka duplicate consumption, such as uncommitted offsets due to forced thread termination, auto‑commit settings, session timeouts, rebalancing, and slow processing, and provides practical solutions including disabling auto‑commit, adjusting consumer configurations, and using new consumer groups.

Big Data Technology & Architecture

Jul 15, 2020

Root Causes and Solutions for Kafka Duplicate Consumption

Duplicate consumption in Kafka happens when a consumer processes messages but the offset is not committed, causing the same records to be read again after a restart or rebalance.

The main reasons identified are:

Forcefully killing the consumer thread, which prevents offset submission.

Using automatic offset commit and calling consumer.unsubscribe() before consumer.close(), which may leave offsets uncommitted.

Session timeout expiration (default 30 s) during long processing, leading to a rebalance before offsets are saved.

Rebalancing that reassigns partitions from the start, causing re‑delivery.

Slow consumer speed that exceeds the session interval, triggering heartbeat failures.

High concurrency that prevents timely offset commits within the session timeout.

The following code demonstrates the problematic pattern that can leave offsets uncommitted:

try {</code><code>    consumer.unsubscribe();</code><code>} catch (Exception e) {</code><code>}</code><code></code><code>try {</code><code>    consumer.close();</code><code>} catch (Exception e) {</code><code>}

To avoid duplicate consumption, disable automatic offset commits and configure the consumer manually.

Spring configuration example:

spring.kafka.consumer.enable-auto-commit=false</code><code>spring.kafka.consumer.auto-offset-reset=latest

API‑level configuration example:

Properties props = new Properties();</code><code>props.put("bootstrap.servers", "localhost:9092");</code><code>props.put("group.id", "test");</code><code>props.put("enable.auto.commit", "false");

When enable.auto.commit is true, Kafka commits offsets at the start of each poll() call, ensuring no loss but still risking duplicates if the commit does not happen before a rebalance.

The consumer property max.poll.interval.ms defines the maximum time between successive poll() calls; exceeding it causes the group to rebalance and may result in uncommitted offsets.

For handling already duplicated data, assign a new consumer group (e.g., order_consumer_group) and set auto-offset-reset=latest so the service can restart without modifying Kafka or Zookeeper.

Note: To consume a topic from the beginning, create a fresh group ID and set auto-offset-reset=earliest.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kafka duplicate-consumption Consumer Configuration auto-commit

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.