Kafka Architecture and Implementation Principles – Theory Part
This article provides a comprehensive, diagram‑driven explanation of Kafka’s architecture, covering producers, consumers, topics, partitions, replication, Zookeeper coordination, controller election, state machines, and the NIO‑based network model, helping readers understand the design philosophy and practical configuration implications.
This is the second article in the "Kafka Series" by the author, focusing on the theoretical architecture and implementation principles of Kafka. It follows the previous performance article and introduces the core components and their relationships.
Opening Note
The author encourages building real products, maintaining a public account or blog to record daily insights, and emphasizes that consistent effort yields great value.
Architecture
Understanding Kafka’s architecture means grasping the concepts of its components and how they interact. The main components are described without memorization.
Key Components
Producer: Sends messages to Kafka. Consumer: Receives messages from Kafka. Consumer Group: A set of consumers that share the load without duplicate consumption; enables both P2P and broadcast consumption models. Broker: The Kafka server node that stores data. Topic: Logical channel for messages; producers write to topics, consumers subscribe to them. Partition: Sub‑division of a topic, stored as an append‑only log with a unique offset per message. Offset: Unique identifier of a message within a partition, guaranteeing order only per partition. Replication: Mechanism for high availability; each partition has multiple replicas with one leader serving reads/writes. Record: The actual message stored in Kafka, containing key, value, and timestamp.
Producer‑Consumer Pattern
The pattern is illustrated with Producer ‑ Consumer and an intermediate component (e.g., a queue) that decouples them, enabling asynchronous processing and buffering.
Distributed Queue Analogy
An analogy describes producers, consumers, and a mailbox (the intermediate component) to explain distributed queuing.
Topic, Partition, and Replication
Topics act like mailboxes, partitions like multiple post offices to increase concurrency, and replication like copying letters to multiple offices for fault tolerance.
Zookeeper
Zookeeper provides distributed configuration, synchronization, and naming services. Kafka stores metadata for brokers, topics, and partitions in Zookeeper and uses it for controller leader election, cluster membership, topic configuration, and replica management.
Controller
The controller is a broker elected via Zookeeper that manages partition leaders, replica assignments, and cluster metadata. It reacts to broker failures, topic creation, partition expansion, and ISR changes.
Election Process
Brokers attempt to create an ephemeral /controller node; the broker that succeeds becomes the controller.
Implementation
The controller reads Zookeeper data into a context, monitors node changes, and propagates updates to other brokers via a LinkedBlockingQueue event queue.
Responsibilities
Handle broker online/offline events and update cluster metadata.
Create topics and allocate partition replicas, leading leader election for replicas.
Manage state machines for partitions and replicas, reacting to state changes.
"State machine sounds complicated, but it is just a model with defined states and transitions."
Partition State Machine
Four states: NonExistentPartition, NewPartition, OnlinePartition, OfflinePartition.
Replica State Machine
Four states: NewReplica, OnlineReplica, OfflineReplica, NonExistentReplica.
Network
Kafka uses an NIO‑based Reactor model with an Acceptor thread for new connections, multiple Processor threads for select/read, and Handler threads for business logic.
The upcoming source‑code series will dive into how these principles are realized in Kafka’s codebase.
Recommended Reading
Kafka from an Interview Perspective
Database and Cache Dual‑Write Consistency
Redis High Availability – Sentinel Principles
Kafka Performance – Why Kafka Is So Fast
Follow the public account "Internet Full‑Stack Architecture" for more valuable information.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
