Big Data 17 min read

Kafka Architecture and Implementation Principles – Part 2

This article provides an in‑depth, English‑language explanation of Kafka's overall architecture, including the roles of producers, consumers, topics, partitions, replication, Zookeeper coordination, controller election, and the NIO‑based network model, helping readers understand both concepts and practical configuration implications.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Kafka Architecture and Implementation Principles – Part 2

This is the second article in the "Kafka Series" by "Code Brother", focusing on the architecture and implementation principles of Kafka. It continues the series that includes a theory part, a practice part, and a source‑code part, and this piece belongs to the theory section.

Readers may refer to the previous article "Kafka Performance: Why Is Kafka So Fast?" before diving into the detailed architecture explanation presented with vivid diagrams.

Opening Message

Creating real products is essential for showcasing your abilities; maintaining a blog or public account to record daily thoughts can be valuable even if the early entries are messy.

Architecture

Understanding Kafka's architecture means grasping the concepts of its components and their relationships.

Do Not Memorize Blindly

Producer

– the entity that creates and sends messages to Kafka. Consumer – the entity that receives messages from Kafka and processes them. Consumer Group – a group of one or more consumers that share the load of consuming partitions; consumers in the same group do not duplicate work, while different groups operate independently. Broker – a Kafka server node that hosts partitions. Topic – a logical category for messages; producers publish to a topic, consumers subscribe to it. Partition – a topic is split into multiple ordered logs; each partition is an append‑only file with a unique offset for each record. Offset – the unique identifier of a record within a partition, guaranteeing order only within that partition. Replication – copies of a partition stored on multiple brokers to provide high availability; one replica is the leader, others are followers. Record – the actual message stored in Kafka, consisting of key, value, and timestamp.

Understand to Remember

Memorization should stem from comprehension of these concepts.

Producer‑Consumer Model

Producer

Consumer is a design pattern where a producer generates data and a consumer processes it, with an intermediate component (often a queue) providing decoupling, asynchronous handling, and buffering.

In code, a Queue typically serves as this intermediate component, allowing multiple producer threads to enqueue data and consumer threads to dequeue it for processing.

Distributed Queue Analogy

The article uses a narrative analogy to illustrate how producers, consumers, and an intermediate “mailbox” (queue) form a distributed system.

Topic

A Kafka topic functions like a mailbox address; any producer can send messages to a topic, and any consumer can subscribe to that topic.

Partition

When a single broker cannot handle the load, multiple brokers are added and partitions are distributed across them to balance traffic; however, too many partitions can cause management overhead.

Replication

Kafka replicates partition data across multiple brokers to ensure high availability; one replica becomes the leader (handling reads/writes) while followers sync from the leader.

Multiple Consumers

Consumer groups enable parallel consumption of partitions, increasing throughput while ensuring each message is processed by only one consumer within the same group.

Broadcast Messaging

Kafka supports broadcast (pub‑sub) semantics via separate consumer groups; messages are delivered to all groups, but each group receives each message only once.

Overall Kafka Architecture Diagram:

Zookeeper

Zookeeper provides distributed configuration, synchronization, and naming services. Kafka stores metadata about brokers, topics, and partitions in Zookeeper and relies on it for leader election, cluster membership, topic configuration, and replica management.

Kafka Controller leader election

Cluster member management

Topic configuration management

Partition replica management

Controller

The controller is a broker elected via Zookeeper to manage partition leaders and followers, handle broker join/leave events, and coordinate topic creation and partition reassignment.

When a broker starts, it checks the Zookeeper /controller node; if the node exists with a valid broker ID, the broker does not compete for controller role. If the node is absent, brokers attempt to create it; the one that succeeds becomes the controller.

Election Process

Implementation

The controller reads Zookeeper nodes, builds a Controller Context, listens for changes, and propagates updates to other brokers via a LinkedBlockingQueue and consumer threads, ensuring ordered processing.

Responsibilities

Handle broker online/offline events and update cluster metadata.

Create topics and assign partition replicas, leading leader election for replicas.

Manage state machines for partitions and replicas, reacting to ISR changes.

"State machine sounds complex, but it is simply a model with defined states and transitions triggered by events."

Partition State Machine

Four states: NonExistentPartition, NewPartition, OnlinePartition, OfflinePartition.

Replica State Machine

Four states: NewReplica, OnlineReplica, OfflineReplica, NonExistentReplica.

Network

Kafka uses a NIO‑based Reactor model with an Acceptor thread for new connections, multiple Processor threads for selecting and reading sockets, and Handler threads for business logic. The diagram below shows the KafkaServer model.

The upcoming source‑code part of the series will dive into how these principles are realized in Kafka's implementation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ZooKeeperKafka
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.