Improving Kafka Consumer Design: From Poll‑Then‑Process Loop to a Better Model
This article analyzes the shortcomings of the traditional poll‑then‑process loop in Kafka consumers, explains configuration pitfalls such as max.poll.interval.ms, and proposes a more robust architecture using work queues, a poller, an executor, and an offset manager to achieve at‑most‑once, at‑least‑once, and exactly‑once processing guarantees.
Introduction
Almost all Kafka Consumer tutorials show the same basic code: creating a consumer, subscribing to topics, and entering an infinite poll‑then‑process loop.
KafkaConsumer<String, Payment> consumer = new KafkaConsumer<>(props);
// Subscribe to Kafka topics
consumer.subscribe(topics);
while (true) {
// Poll Kafka for new messages
ConsumerRecords<String, String> records = consumer.poll(100);
// Processing logic
for (ConsumerRecord<String, String> record : records) {
doSomething(record);
}
}This simple model works but suffers from several issues that will be detailed next.
Problems
Developers may think poll is merely a request for data. If processing is slow, Kafka acts as a buffer, but the interval between polls grows, hitting the default max.poll.interval.ms (5 minutes). If poll isn’t called within this timeout, the consumer is considered dead and a rebalance occurs, adding latency.
max.poll.interval.ms Maximum delay between calls to poll() when using consumer groups. If the timeout expires, the consumer is deemed failed and a rebalance is triggered.
Increasing max.poll.interval.ms isn’t ideal because rebalance time can be up to twice this value. Setting a lower max.poll.records reduces the number of records per poll, shortening the interval, but the fundamental issue remains.
Message Processing Is Asynchronous
Kafka guarantees order only within a partition; messages from different partitions can be processed in parallel. Running as many consumers as partitions maximizes parallelism but adds overhead and more rebalance opportunities.
Using a thread pool to process each record works only when ordering and delivery guarantees are not required.
A Better Model
Overview
The poll‑then‑process loop mixes polling, processing, and offset committing. By separating these concerns we obtain a model that supports parallel processing and back‑pressure.
Work Queues
Work queues act as the communication channel between the poller and the executor:
Each assigned TopicPartition maps one‑to‑one to a work queue. After each poll, the poller pushes new records to the corresponding queue, preserving order. Queues have a configurable size; when full they apply back‑pressure to the poller.
Work queues are asynchronous, decoupling polling from processing, unlike the tightly coupled poll‑then‑process loop.
Poller
The poller encapsulates everything related to polling Kafka:
Monitors rebalance events via ConsumerRebalanceListener and coordinates other components.
Creates a new work queue for each newly assigned TopicPartition and tears down queues for revoked partitions.
Uses a short, configurable interval (e.g., 50 ms) to poll, far lower than max.poll.interval.ms, avoiding rebalance storms.
If an executor cannot keep up, its queue fills, the poller pauses that partition, and resumes it when the queue drains.
pause(Collection<TopicPartition> partitions) Pauses fetching from the given partitions until resume is called.
Executor
The executor is a thread‑pool‑like component that runs workers to process messages:
Number of workers is configurable to match CPU or I/O constraints.
Each queue is handled by a single worker, preserving order within a partition.
A worker may serve multiple queues, enabling parallel processing across partitions.
Offset Manager
Every Kafka record has an offset that serves as a checkpoint. The offset manager is responsible for storing and committing offsets, either to Kafka or an external store, and can operate synchronously or asynchronously.
Manual offset commits (disable enable.auto.commit) give precise control over delivery guarantees.
Commit strategies can be batch‑based, timer‑based, etc.
Automatic commits provide at‑least‑once delivery but no processing guarantees; therefore manual management is preferred.
Processing Guarantees
At‑Most‑Once
Commit the offset before processing the message. If processing fails, the message is lost, but duplicates cannot occur.
At‑Least‑Once
Commit the offset only after successful processing. If a failure occurs before the commit, the message may be re‑processed, ensuring no loss.
Exactly‑Once (External Offset Management)
Perform offset storage and message handling within a single transaction, requiring tight coordination between executor and offset manager.
public void seek(TopicPartition partition, long offset) Overrides the offset that the consumer will use on the next poll.
Summary
We examined the drawbacks of the poll‑then‑process loop and introduced a more suitable model consisting of work queues, a poller, an executor, and an offset manager. While the model is more complex, it provides clearer semantics for parallelism, back‑pressure, and processing guarantees, making it valuable for evaluating existing Kafka client libraries such as Alpakka Kafka, Spring for Kafka, or zio‑kafka.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
