Big Data 16 min read

Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

This article provides a comprehensive overview of Kafka's architecture, covering topics, partitions, producer and consumer workflows, replication and leader election, offset management, consumer group coordination, rebalance processes, delivery semantics (at‑most‑once, at‑least‑once, exactly‑once), transactional messaging, and underlying file and configuration details.

Architecture Digest

May 5, 2019

Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

Kafka is a distributed message queue offering high performance, persistence, replication, and horizontal scalability. Producers write messages to topics, which are divided into partitions for parallelism; consumers read from topics via consumer groups, ensuring each partition is processed by only one consumer within a group.

Partitions are replicated across brokers; one replica is elected as the leader, handling all read/write requests while followers sync from it. The Controller, elected via ZooKeeper, manages partition assignment and leader election, updating ZooKeeper and notifying affected brokers.

Partition assignment follows a deterministic algorithm: brokers and partitions are sorted, then each partition i is assigned to broker (i mod n) as leader, with replicas placed on subsequent brokers.

Offset storage originally used ZooKeeper but moved to an internal __consumer_offsets topic (compact cleanup) to improve performance. Offsets are keyed by groupId, topic, and partition, and the responsible partition is calculated as Math.abs(groupId.hashCode() % offsetsTopicPartitionCount).

Consumer group coordination involves a Coordinator (the broker leading the offset partition) handling join, heartbeat, and rebalance requests. Rebalance distributes partitions among consumers, selecting a leader among them to compute the assignment.

Kafka supports three delivery semantics: at‑most‑once (possible loss, no duplicates), at‑least‑once (no loss, possible duplicates), and exactly‑once (no loss, no duplicates, available from version 0.11 when downstream is also Kafka).

Exactly‑once is achieved through idempotent producers (assigning a unique producer ID and sequence numbers) and transactional messaging. Transactions use a transaction ID (tid) and a Transaction Coordinator to log transaction states (Begin, Prepare‑Commit/Abort, Commit/Abort). After a successful commit, marker messages make the transaction's data visible to consumers.

Kafka stores data as log segments on the filesystem, each segment accompanied by offset and time index files. Indexes are sparse, storing base offsets and file positions to enable efficient binary search and sequential scans.

Configuration includes broker settings (e.g., replication factor, log retention) and topic settings (e.g., partitions, cleanup policy). Proper tuning of these parameters is essential for performance, durability, and resource utilization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kafka replication Distributed Messaging Exactly-once Transactional Messaging consumer groups

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.