Understanding Kafka Consumer: Offset Management, Rebalance, Partition Assignment, and Thread Safety
This article provides a comprehensive technical walkthrough of KafkaConsumer, covering Java configuration code, delivery semantics (at‑most‑once, at‑least‑once, exactly‑once), offset commit strategies, rebalance mechanisms, partition assignment algorithms, thread‑safety concerns, and internal poll implementation, followed by unrelated promotional content.
The author, a senior architect, presents an in‑depth analysis of KafkaConsumer behavior and best practices.
Java code example demonstrates how to configure a consumer, subscribe to a topic, and continuously poll records.
public static void main(String[] args) {
Properties props = new Properties();
String topic = "test";
String group = "test0";
props.put("bootstrap.servers", "XXX:9092,XXX:9092");
props.put("group.id", group);
props.put("auto.offset.reset", "earliest");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer
consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topic));
while (true) {
ConsumerRecords
records = consumer.poll(100);
for (ConsumerRecord
record : records) {
System.out.printf("offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value());
try { Thread.sleep(100); } catch (InterruptedException e) { e.printStackTrace(); }
}
}
}The article defines three delivery guarantees:
At most once – messages may be lost but never duplicated.
At least once – messages are never lost but may be duplicated.
Exactly once – each message is processed a single time.
For exactly‑once semantics, the producer can attach a globally unique ID to each message and the consumer can filter duplicates; the consumer’s offset‑commit timing (before or after processing) determines whether at‑most‑once or at‑least‑once behavior occurs.
Consumer rebalance is triggered by changes in group membership, topic subscription count, or partition count. The article describes the original Zookeeper‑based watch approach, the GroupCoordinator workflow, and a third design where partition assignment is handled by the consumer leader after a JoinGroup request.
Two common partition assignors are explained:
RoundRobinAssignor – iterates over all topic‑partitions and distributes them in a round‑robin fashion.
RangeAssignor – allocates contiguous ranges of partitions to consumers, illustrated with a 7‑partition, 5‑consumer example.
KafkaConsumer analysis highlights that the consumer client is not thread‑safe; the recommended pattern is to use separate thread pools for polling and processing, or a producer‑consumer model where each thread owns its own KafkaConsumer instance.
private final AtomicLong currentThread = new AtomicLong(NO_CURRENT_THREAD);
private final AtomicInteger refcount = new AtomicInteger(0);
private void acquire() {
ensureNotClosed();
long threadId = Thread.currentThread().getId();
if (threadId != currentThread.get() && !currentThread.compareAndSet(NO_CURRENT_THREAD, threadId))
throw new ConcurrentModificationException("KafkaConsumer is not safe for multi‑threaded access");
refcount.incrementAndGet();
}
public ConsumerRecords
poll(long timeout) {
acquire();
try {
do {
Map
>> records = pollOnce(remaining);
if (!records.isEmpty()) {
if (fetcher.sendFetches() > 0 || client.pendingRequestCount() > 0)
client.pollNoWakeup();
return interceptors == null ? new ConsumerRecords<>(records) : interceptors.onConsume(new ConsumerRecords<>(records));
}
} while (remaining > 0);
return ConsumerRecords.empty();
} finally {
release();
}
}The article also contains a series of promotional sections advertising ChatGPT services, a paid community, and various external links, which are unrelated to the technical discussion.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.