Big Data 16 min read

Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

This article provides a comprehensive overview of Kafka's architecture, covering topics, partitions, producer and consumer workflows, replication and leader election, offset management, consumer group coordination, rebalance processes, delivery semantics (at‑most‑once, at‑least‑once, exactly‑once), transactional messaging, and underlying file and configuration details.

Architecture Digest
Architecture Digest
Architecture Digest
Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

Kafka is a distributed message queue offering high performance, persistence, replication, and horizontal scalability. Producers write messages to topics, which are divided into partitions for parallelism; consumers read from topics via consumer groups, ensuring each partition is processed by only one consumer within a group.

Partitions are replicated across brokers; one replica is elected as the leader, handling all read/write requests while followers sync from it. The Controller, elected via ZooKeeper, manages partition assignment and leader election, updating ZooKeeper and notifying affected brokers.

Partition assignment follows a deterministic algorithm: brokers and partitions are sorted, then each partition i is assigned to broker (i mod n) as leader, with replicas placed on subsequent brokers.

Offset storage originally used ZooKeeper but moved to an internal __consumer_offsets topic (compact cleanup) to improve performance. Offsets are keyed by groupId, topic, and partition, and the responsible partition is calculated as Math.abs(groupId.hashCode() % offsetsTopicPartitionCount) .

Consumer group coordination involves a Coordinator (the broker leading the offset partition) handling join, heartbeat, and rebalance requests. Rebalance distributes partitions among consumers, selecting a leader among them to compute the assignment.

Kafka supports three delivery semantics: at‑most‑once (possible loss, no duplicates), at‑least‑once (no loss, possible duplicates), and exactly‑once (no loss, no duplicates, available from version 0.11 when downstream is also Kafka).

Exactly‑once is achieved through idempotent producers (assigning a unique producer ID and sequence numbers) and transactional messaging. Transactions use a transaction ID (tid) and a Transaction Coordinator to log transaction states (Begin, Prepare‑Commit/Abort, Commit/Abort). After a successful commit, marker messages make the transaction's data visible to consumers.

Kafka stores data as log segments on the filesystem, each segment accompanied by offset and time index files. Indexes are sparse, storing base offsets and file positions to enable efficient binary search and sequential scans.

Configuration includes broker settings (e.g., replication factor, log retention) and topic settings (e.g., partitions, cleanup policy). Proper tuning of these parameters is essential for performance, durability, and resource utilization.

KafkaReplicationDistributed Messagingpartitioningexactly-onceTransactional Messagingconsumer-groups
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.