Big Data 13 min read

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

This article explains how Kafka consumer groups accelerate message consumption by distributing partitions across multiple consumers, details the three key characteristics of consumer groups, and provides in‑depth guidance on partition assignment strategies and offset management with practical Java code examples.

Big Data Technology & Architecture

Sep 18, 2020

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

When a Kafka topic contains millions of messages, a single consumer process is insufficient for timely consumption; using Kafka's consumer group feature allows multiple consumers on different machines to work in parallel, dramatically increasing throughput.

Each consumer group has a unique group ID and may consist of one or more consumers. Within a group, a message is considered consumed once any consumer processes it, ensuring no duplicate consumption across the group.

Kafka consumer groups have three main characteristics:

Each group contains one or more consumers.

Each group is identified by a unique group ID.

When consuming a topic, each partition can be assigned to only one consumer within the group.

Example command to create a topic with eight partitions:

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 8 --topic HelloKafka

Sample Java consumer that joins a group and processes records:

public class GroupConsumer {

    private KafkaConsumer<String, String> consumer;
    private final int id;

    public GroupConsumer(int id) {
        this.id = id;
        Properties props = new Properties();
        props.put("client.id", "client-" + id);
        props.put("bootstrap.servers", KafkaConfig.SERVER);
        props.put("group.id", KafkaConfig.GROUP);
        props.put("zookeeper.session.timeout.ms", "4000");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("enable.auto.commit", "false");
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("partition.assignment.strategy", "org.apache.kafka.clients.consumer.RangeAssignor");
        consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Collections.singleton(KafkaConfig.TOPIC));
    }

    public void consume() {
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
            for (ConsumerRecord<String, String> record : records) {
                System.out.printf("id = %d , partition = %d , offset = %d, key = %s, value = %s%n", id, record.partition(), record.offset(), record.key(), record.value());
            }
            consumer.commitSync();
        }
    }

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < 8; i++) {
            final int id = i;
            new Thread() {
                @Override
                public void run() {
                    new GroupConsumer(id).consume();
                }
            }.start();
        }
        TimeUnit.SECONDS.sleep(Long.MAX_VALUE);
    }
}

Kafka guarantees that each message is consumed by only one consumer in a group through two mechanisms: partition assignment and offset management.

1. Partition Assignment

Each topic is split into multiple partitions. A partition can be assigned to only one consumer within a group, preserving order per partition. If there are more consumers than partitions, some consumers remain idle.

2. Assignment Strategies

Kafka provides three built‑in assignors:

2.1 RangeAssignor

Partitions are sorted and divided evenly among consumers based on lexical order, which can lead to load imbalance when the first consumers receive many partitions.

C1: Partition0, Partition1, Partition2
C2: Partition3, Partition4, Partition5
C3: Partition6, Partition7

2.2 RoundRobinAssignor

Partitions are distributed in a round‑robin fashion across consumers, achieving a more balanced load.

C1: Topic1-P0 Topic1-P3, Topic2-2
C2: Topic1-P1, Topic2-0,
C3: Topic1-P2, Topic2-1

2.3 StickyAssignor

Attempts to keep existing partition‑consumer assignments stable during rebalances, moving only the partitions of failed consumers.

C1: Topic1-P0 Topic2-P0,
C2: Topic1-P1, Topic2-P1,
C3: Topic1-P2, Topic2-P2

When C1 crashes, StickyAssignor redistributes its partitions to C2 and C3 while preserving their current assignments.

3. Offset Management

3.1 Manual Commit

Automatic commits are disabled with props.put("enable.auto.commit", "false"). After processing a batch, the consumer must explicitly commit offsets, either synchronously ( consumer.commitSync()) or asynchronously ( consumer.commitAsync()).

3.2 Synchronous Commit

consumer.commitSync();

The call blocks until the broker acknowledges the commit; on failure it retries until timeout.

3.3 Asynchronous Commit

consumer.commitAsync();

Non‑blocking; failures can be handled via a callback, e.g.:

consumer.commitAsync(new RetryOffsetCommitCallback(consumer));

3.4 Automatic Commit

Enabled with props.put("enable.auto.commit", "true") and props.put("auto.commit.interval.ms", "1000"). Offsets are persisted every second, which may cause duplicate processing if a consumer crashes before finishing the batch.

3.5 Offset Storage

Older Kafka versions stored offsets in Zookeeper (e.g., /consumers/ConsumerGroup/offsets/TestTopic/0); modern versions store them in the internal _consumer_offsets topic, which is partitioned (default 50 partitions) to spread load.

The partition for a given offset is chosen by Math.abs(hash(groupID)) % numPartitions. Kafka periodically compresses the _consumer_offsets topic, keeping only the latest offset per groupID+topic+partition key.

4. Manual vs. Automatic Commit

Automatic commits are simple but risk message loss if a consumer crashes after fetching messages but before processing them. For critical data where loss is unacceptable, manual commits (synchronous or asynchronous with retries) are recommended.

Overall, understanding consumer group mechanics, partition assignors, and offset handling is essential for building reliable, high‑throughput Kafka consumers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data kafka Consumer Group Partition Assignment Offset Management

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.