Understanding Kafka Architecture: Topics, Partitions, Consumption Model, Network and Storage
This article explains Kafka's core architecture, covering how topics and partitions are stored, the advantages of its consumption model, the internal network and threading design, and the high‑reliability distributed log storage and replication mechanisms that ensure data durability and scalability.
Key Questions about Kafka Architecture
1. How are Kafka topics and partitions stored internally? 2. What advantages does Kafka's consumption model have over traditional messaging systems? 3. How does Kafka achieve distributed data storage and retrieval?
Kafka Architecture Diagram
Terminology
Broker: a Kafka node; multiple brokers form a cluster.
Topic: a logical channel for categorizing messages.
Producer: client that sends messages to brokers.
Consumer: client that reads messages from brokers.
ConsumerGroup: a set of consumers sharing the same group ID; only one consumer in a group processes a given message.
Partition: a physical ordered log segment within a topic; each partition is an append‑only log.
Topic and Partition Details
Each message belongs to a topic; a topic can have many partitions. Partitions store messages as an append‑only log, assigning a monotonically increasing offset to each record. Ordering is guaranteed only within a partition.
Producer routing to partitions: without a key, messages are round‑robin; with a key, the key is hashed and the result modulo the partition count determines the target partition, ensuring the same key always lands in the same partition.
Consumption Model
Kafka uses a pull‑based model: consumers poll the broker at their own pace, can specify offsets, and can reprocess messages, providing better control and reliability compared to push‑based systems.
Network Model
Client side: a single‑threaded selector handles connections, suitable for low concurrency.
Server side: a multi‑threaded selector with an Acceptor thread and separate thread pools for read and write operations, preventing blocking and improving scalability.
High‑Reliability Distributed Storage Model
Messages are stored in partitioned log files, each partition consisting of multiple LogSegments (index and .log files). Kafka uses sparse indexing to reduce space and speed up lookups.
Reading a specific offset involves locating the correct LogSegment via binary search, then using the index to find the physical file position.
Sequential reads benefit from OS page cache, but excessive partitions increase random I/O during writes, so a moderate number of partitions is recommended.
Replication Mechanism
Each partition has one leader replica and zero or more follower replicas. The ISR (In‑Sync Replicas) set contains replicas that are up‑to‑date with the leader; only ISR members can be elected as new leaders.
HW (High Watermark) marks the offset visible to consumers; LEO (Log End Offset) marks the last offset in the log. Producers can set acks to 0, 1, or -1 to control durability guarantees.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.