Understanding Kafka Cluster Architecture: Core Components and How They Work
This article explains the essential components of a Kafka cluster—including producers, consumers, brokers, Zookeeper, and the KRaft controller—detailing how topics, partitions, and replication enable high‑throughput, scalable, and fault‑tolerant distributed messaging.
Kafka is an essential middleware for large‑scale architectures. A Kafka cluster consists of multiple broker nodes that form a distributed messaging system.
It achieves high throughput, scalability, and high availability through a distributed design, primarily used for real‑time data streams and log collection.
Messages are organized by topics; each topic contains one or more partitions, which are the physical storage units. Partitions have replicas distributed across different brokers to ensure reliability and fault tolerance.
Kafka Cluster Architecture
The architecture includes producers, consumers, brokers, Zookeeper (in the traditional mode), and the controller (in KRaft mode).
1. Producer
Producers send messages to a specified topic and partition, using partitioning strategies such as round‑robin or key hashing. They discover cluster metadata by connecting to any broker, which then informs them of the leader broker for the target partition.
2. Consumer
Consumers subscribe to one or more topics via a consumer group. Within a group, multiple consumer instances share the partitions of the subscribed topics, ensuring that each partition is consumed by only one instance in the group, preserving order.
3. Broker
Each broker stores messages, serves consumer requests, participates in replica replication and leader election, and performs log compaction and cleanup. Topics’ partitions are distributed across brokers, and replicas provide high availability.
4. Zookeeper (traditional mode)
Zookeeper stores cluster metadata such as broker registrations, topic and partition information, and leader elections.
5. Controller (KRaft mode)
The controller is a broker that manages cluster metadata and coordination, replacing part of Zookeeper’s functionality.
+-----------------+
+-----------------+
+-----------------+
|Producer(s) |
|------>Broker1 |
|------>Broker2 |
|------>... |
+-----------------+
| (Leader for P1) |
| (Follower for P1) |
| (Follower for P2) |
| (Leader for P2) |
+-----------------+
|Consumer(s) |
|<------Broker1 |
|<------Broker2 |
|<------... |
+-----------------+
|Zookeeper (traditional mode) |
+-----------------+
|Controller (KRaft mode) |
+-----------------+The design revolves around distribution, partitioning, and replication, enabling horizontal scaling, high throughput, and reliable data delivery.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
