Kafka Architecture and Implementation Principles – Part 2
This article provides an in‑depth, English‑language explanation of Kafka's overall architecture, including the roles of producers, consumers, topics, partitions, replication, Zookeeper coordination, controller election, and the NIO‑based network model, helping readers understand both concepts and practical configuration implications.
This is the second article in the "Kafka Series" by "Code Brother", focusing on the architecture and implementation principles of Kafka. It continues the series that includes a theory part, a practice part, and a source‑code part, and this piece belongs to the theory section.
Readers may refer to the previous article "Kafka Performance: Why Is Kafka So Fast?" before diving into the detailed architecture explanation presented with vivid diagrams.
Opening Message
Creating real products is essential for showcasing your abilities; maintaining a blog or public account to record daily thoughts can be valuable even if the early entries are messy.
Architecture
Understanding Kafka's architecture means grasping the concepts of its components and their relationships.
Do Not Memorize Blindly
Producer– the entity that creates and sends messages to Kafka. Consumer – the entity that receives messages from Kafka and processes them. Consumer Group – a group of one or more consumers that share the load of consuming partitions; consumers in the same group do not duplicate work, while different groups operate independently. Broker – a Kafka server node that hosts partitions. Topic – a logical category for messages; producers publish to a topic, consumers subscribe to it. Partition – a topic is split into multiple ordered logs; each partition is an append‑only file with a unique offset for each record. Offset – the unique identifier of a record within a partition, guaranteeing order only within that partition. Replication – copies of a partition stored on multiple brokers to provide high availability; one replica is the leader, others are followers. Record – the actual message stored in Kafka, consisting of key, value, and timestamp.
Understand to Remember
Memorization should stem from comprehension of these concepts.
Producer‑Consumer Model
Producer‑ Consumer is a design pattern where a producer generates data and a consumer processes it, with an intermediate component (often a queue) providing decoupling, asynchronous handling, and buffering.
In code, a Queue typically serves as this intermediate component, allowing multiple producer threads to enqueue data and consumer threads to dequeue it for processing.
Distributed Queue Analogy
The article uses a narrative analogy to illustrate how producers, consumers, and an intermediate “mailbox” (queue) form a distributed system.
Topic
A Kafka topic functions like a mailbox address; any producer can send messages to a topic, and any consumer can subscribe to that topic.
Partition
When a single broker cannot handle the load, multiple brokers are added and partitions are distributed across them to balance traffic; however, too many partitions can cause management overhead.
Replication
Kafka replicates partition data across multiple brokers to ensure high availability; one replica becomes the leader (handling reads/writes) while followers sync from the leader.
Multiple Consumers
Consumer groups enable parallel consumption of partitions, increasing throughput while ensuring each message is processed by only one consumer within the same group.
Broadcast Messaging
Kafka supports broadcast (pub‑sub) semantics via separate consumer groups; messages are delivered to all groups, but each group receives each message only once.
Overall Kafka Architecture Diagram:
Zookeeper
Zookeeper provides distributed configuration, synchronization, and naming services. Kafka stores metadata about brokers, topics, and partitions in Zookeeper and relies on it for leader election, cluster membership, topic configuration, and replica management.
Kafka Controller leader election
Cluster member management
Topic configuration management
Partition replica management
Controller
The controller is a broker elected via Zookeeper to manage partition leaders and followers, handle broker join/leave events, and coordinate topic creation and partition reassignment.
When a broker starts, it checks the Zookeeper /controller node; if the node exists with a valid broker ID, the broker does not compete for controller role. If the node is absent, brokers attempt to create it; the one that succeeds becomes the controller.
Election Process
Implementation
The controller reads Zookeeper nodes, builds a Controller Context, listens for changes, and propagates updates to other brokers via a LinkedBlockingQueue and consumer threads, ensuring ordered processing.
Responsibilities
Handle broker online/offline events and update cluster metadata.
Create topics and assign partition replicas, leading leader election for replicas.
Manage state machines for partitions and replicas, reacting to ISR changes.
"State machine sounds complex, but it is simply a model with defined states and transitions triggered by events."
Partition State Machine
Four states: NonExistentPartition, NewPartition, OnlinePartition, OfflinePartition.
Replica State Machine
Four states: NewReplica, OnlineReplica, OfflineReplica, NonExistentReplica.
Network
Kafka uses a NIO‑based Reactor model with an Acceptor thread for new connections, multiple Processor threads for selecting and reading sockets, and Handler threads for business logic. The diagram below shows the KafkaServer model.
The upcoming source‑code part of the series will dive into how these principles are realized in Kafka's implementation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
