Big Data 9 min read

Analyzing and Resolving Kafka Consumer Rebalance Errors Caused by max.poll.interval.ms

The article examines a Kafka consumer rebalance error caused by exceeding max.poll.interval.ms, explains the underlying mechanics of poll intervals, offset handling, and provides practical solutions such as adjusting max.poll.interval.ms, limiting poll records, and committing offsets per message to prevent frequent rebalances.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Analyzing and Resolving Kafka Consumer Rebalance Errors Caused by max.poll.interval.ms

Today our online Kafka message broker experienced frequent rebalance errors, with the consumer group rebalancing every 2–3 minutes. The log shows a CommitFailedException indicating that the group had already rebalanced and the partitions were assigned to another member because the time between successive poll() calls exceeded the configured max.poll.interval.ms.

The max.poll.interval.ms setting defines the maximum allowed delay between two poll() invocations. If the consumer does not call poll() again within this interval, the broker considers the consumer dead, removes it from the group, and triggers a rebalance.

In our case, the consumer sometimes fetched over 250 records in a single poll. Although most messages were processed within 500 ms, a few took more than a minute, causing the poll interval to exceed the default 300 seconds and leading to rebalance failures.

Key log excerpts:

08-09 11:01:11 131 pool-7-thread-3 ERROR [] - commit failed
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms...

Consumer processing logic (simplified):

while (isRunning) {
    ConsumerRecords<KEY, VALUE> records = consumer.poll(100);
    if (records != null && records.count() > 0) {
        for (ConsumerRecord<KEY, VALUE> record : records) {
            dealMessage(bizConsumer, record.value());
            try {
                // commit after processing all records in the batch
                consumer.commitSync();
            } catch (CommitFailedException e) {
                logger.error("commit failed, will break this for loop", e);
                break;
            }
        }
    }
}

The poll() method retrieves a batch of messages; if processing the batch takes longer than max.poll.interval.ms, the broker removes the consumer and triggers a rebalance, causing duplicate consumption because the offset was not committed.

Solutions:

Increase max.poll.interval.ms to accommodate longer processing times (default is 300 s).

Limit the number of records per poll by setting max.poll.records (e.g., 50) to reduce batch size.

Commit offsets after each message or break the loop on commit failure to allow the broker to rebalance promptly.

Example configuration adjustments:

max.poll.interval.ms=300
max.poll.records=50

By applying these changes, the consumer can avoid frequent rebalances and ensure reliable offset commits.

Finally, the article encourages readers to like, share, and follow for more technical insights.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaKafkarebalancemax.poll.interval.ms
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.