Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management
This article explains how Kafka consumer groups accelerate message consumption by distributing partitions across multiple consumers, details the three key characteristics of consumer groups, and provides in‑depth guidance on partition assignment strategies and offset management with practical Java code examples.
When a Kafka topic contains millions of messages, a single consumer process is insufficient for timely consumption; using Kafka's consumer group feature allows multiple consumers on different machines to work in parallel, dramatically increasing throughput.
Each consumer group has a unique group ID and may consist of one or more consumers. Within a group, a message is considered consumed once any consumer processes it, ensuring no duplicate consumption across the group.
Kafka consumer groups have three main characteristics:
Each group contains one or more consumers.
Each group is identified by a unique group ID.
When consuming a topic, each partition can be assigned to only one consumer within the group.
Example command to create a topic with eight partitions:
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 8 --topic HelloKafkaSample Java consumer that joins a group and processes records:
public class GroupConsumer {
private KafkaConsumer<String, String> consumer;
private final int id;
public GroupConsumer(int id) {
this.id = id;
Properties props = new Properties();
props.put("client.id", "client-" + id);
props.put("bootstrap.servers", KafkaConfig.SERVER);
props.put("group.id", KafkaConfig.GROUP);
props.put("zookeeper.session.timeout.ms", "4000");
props.put("zookeeper.sync.time.ms", "200");
props.put("enable.auto.commit", "false");
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("partition.assignment.strategy", "org.apache.kafka.clients.consumer.RangeAssignor");
consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singleton(KafkaConfig.TOPIC));
}
public void consume() {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("id = %d , partition = %d , offset = %d, key = %s, value = %s%n", id, record.partition(), record.offset(), record.key(), record.value());
}
consumer.commitSync();
}
}
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 8; i++) {
final int id = i;
new Thread() {
@Override
public void run() {
new GroupConsumer(id).consume();
}
}.start();
}
TimeUnit.SECONDS.sleep(Long.MAX_VALUE);
}
}Kafka guarantees that each message is consumed by only one consumer in a group through two mechanisms: partition assignment and offset management.
1. Partition Assignment
Each topic is split into multiple partitions. A partition can be assigned to only one consumer within a group, preserving order per partition. If there are more consumers than partitions, some consumers remain idle.
2. Assignment Strategies
Kafka provides three built‑in assignors:
2.1 RangeAssignor
Partitions are sorted and divided evenly among consumers based on lexical order, which can lead to load imbalance when the first consumers receive many partitions.
C1: Partition0, Partition1, Partition2
C2: Partition3, Partition4, Partition5
C3: Partition6, Partition72.2 RoundRobinAssignor
Partitions are distributed in a round‑robin fashion across consumers, achieving a more balanced load.
C1: Topic1-P0 Topic1-P3, Topic2-2
C2: Topic1-P1, Topic2-0,
C3: Topic1-P2, Topic2-12.3 StickyAssignor
Attempts to keep existing partition‑consumer assignments stable during rebalances, moving only the partitions of failed consumers.
C1: Topic1-P0 Topic2-P0,
C2: Topic1-P1, Topic2-P1,
C3: Topic1-P2, Topic2-P2When C1 crashes, StickyAssignor redistributes its partitions to C2 and C3 while preserving their current assignments.
3. Offset Management
3.1 Manual Commit
Automatic commits are disabled with props.put("enable.auto.commit", "false"). After processing a batch, the consumer must explicitly commit offsets, either synchronously ( consumer.commitSync()) or asynchronously ( consumer.commitAsync()).
3.2 Synchronous Commit
consumer.commitSync();The call blocks until the broker acknowledges the commit; on failure it retries until timeout.
3.3 Asynchronous Commit
consumer.commitAsync();Non‑blocking; failures can be handled via a callback, e.g.:
consumer.commitAsync(new RetryOffsetCommitCallback(consumer));3.4 Automatic Commit
Enabled with props.put("enable.auto.commit", "true") and props.put("auto.commit.interval.ms", "1000"). Offsets are persisted every second, which may cause duplicate processing if a consumer crashes before finishing the batch.
3.5 Offset Storage
Older Kafka versions stored offsets in Zookeeper (e.g., /consumers/ConsumerGroup/offsets/TestTopic/0); modern versions store them in the internal _consumer_offsets topic, which is partitioned (default 50 partitions) to spread load.
The partition for a given offset is chosen by Math.abs(hash(groupID)) % numPartitions. Kafka periodically compresses the _consumer_offsets topic, keeping only the latest offset per groupID+topic+partition key.
4. Manual vs. Automatic Commit
Automatic commits are simple but risk message loss if a consumer crashes after fetching messages but before processing them. For critical data where loss is unacceptable, manual commits (synchronous or asynchronous with retries) are recommended.
Overall, understanding consumer group mechanics, partition assignors, and offset handling is essential for building reliable, high‑throughput Kafka consumers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
