Big Data 15 min read

Kafka Architecture and Implementation Principles – Theory Part

This article provides a comprehensive, diagram‑driven explanation of Kafka’s architecture, covering producers, consumers, topics, partitions, replication, Zookeeper coordination, controller election, state machines, and the NIO‑based network model, helping readers understand the design philosophy and practical configuration implications.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Kafka Architecture and Implementation Principles – Theory Part

This is the second article in the "Kafka Series" by the author, focusing on the theoretical architecture and implementation principles of Kafka. It follows the previous performance article and introduces the core components and their relationships.

Opening Note

The author encourages building real products, maintaining a public account or blog to record daily insights, and emphasizes that consistent effort yields great value.

Architecture

Understanding Kafka’s architecture means grasping the concepts of its components and how they interact. The main components are described without memorization.

Key Components

Producer: Sends messages to Kafka. Consumer: Receives messages from Kafka. Consumer Group: A set of consumers that share the load without duplicate consumption; enables both P2P and broadcast consumption models. Broker: The Kafka server node that stores data. Topic: Logical channel for messages; producers write to topics, consumers subscribe to them. Partition: Sub‑division of a topic, stored as an append‑only log with a unique offset per message. Offset: Unique identifier of a message within a partition, guaranteeing order only per partition. Replication: Mechanism for high availability; each partition has multiple replicas with one leader serving reads/writes. Record: The actual message stored in Kafka, containing key, value, and timestamp.

Producer‑Consumer Pattern

The pattern is illustrated with ProducerConsumer and an intermediate component (e.g., a queue) that decouples them, enabling asynchronous processing and buffering.

Distributed Queue Analogy

An analogy describes producers, consumers, and a mailbox (the intermediate component) to explain distributed queuing.

Topic, Partition, and Replication

Topics act like mailboxes, partitions like multiple post offices to increase concurrency, and replication like copying letters to multiple offices for fault tolerance.

Zookeeper

Zookeeper provides distributed configuration, synchronization, and naming services. Kafka stores metadata for brokers, topics, and partitions in Zookeeper and uses it for controller leader election, cluster membership, topic configuration, and replica management.

Controller

The controller is a broker elected via Zookeeper that manages partition leaders, replica assignments, and cluster metadata. It reacts to broker failures, topic creation, partition expansion, and ISR changes.

Election Process

Brokers attempt to create an ephemeral /controller node; the broker that succeeds becomes the controller.

Implementation

The controller reads Zookeeper data into a context, monitors node changes, and propagates updates to other brokers via a LinkedBlockingQueue event queue.

Responsibilities

Handle broker online/offline events and update cluster metadata.

Create topics and allocate partition replicas, leading leader election for replicas.

Manage state machines for partitions and replicas, reacting to state changes.

"State machine sounds complicated, but it is just a model with defined states and transitions."

Partition State Machine

Four states: NonExistentPartition, NewPartition, OnlinePartition, OfflinePartition.

Replica State Machine

Four states: NewReplica, OnlineReplica, OfflineReplica, NonExistentReplica.

Network

Kafka uses an NIO‑based Reactor model with an Acceptor thread for new connections, multiple Processor threads for select/read, and Handler threads for business logic.

The upcoming source‑code series will dive into how these principles are realized in Kafka’s codebase.

Recommended Reading

Kafka from an Interview Perspective

Database and Cache Dual‑Write Consistency

Redis High Availability – Sentinel Principles

Kafka Performance – Why Kafka Is So Fast

Follow the public account "Internet Full‑Stack Architecture" for more valuable information.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsarchitectureZooKeepernetworkReplicationController
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.