Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability
This article provides a comprehensive overview of Kafka's architecture and fundamental concepts, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower synchronization, offset handling, message storage at both logical and physical layers, as well as producer and consumer workflows, partition assignment strategies, rebalancing, log management, zero‑copy I/O, and reliability mechanisms.
Kafka can decouple systems, smooth traffic spikes, buffer data, and enable asynchronous communication, making it ideal for activity tracking, messaging, metrics, logging, and stream processing. This article introduces Kafka's basic concepts.
Kafka Overall Structure
The following diagram shows many details of Kafka; you can ignore it for now.
The diagram displays several important Kafka components, which will be introduced one by one.
(一)Broker
A broker is a service node; multiple brokers form a Kafka cluster.
(二)Producer
A producer writes messages to the broker.
(三)Consumer
A consumer reads messages from the broker.
(四)Consumer Group
A consumer group consists of one or more consumers; different groups can subscribe to the same topic independently.
(五)ZooKeeper
Kafka uses ZooKeeper to manage cluster metadata and controller election.
(六)Topic
Each message belongs to a topic; topics logically categorize messages.
(七)Partition
A topic can be split into multiple partitions; each partition belongs to a single leader.
(八)Replica
Each partition can have multiple replicas for fault tolerance.
(九)Leader and Follower
Kafka uses a leader‑follower model for replication: the leader handles reads/writes, followers act as backups.
(十)Offset
Each message in a partition has a unique offset that identifies its position and guarantees order within the partition.
In summary, Kafka, as a data system, must solve two fundamental problems: how to store data and how to retrieve it.
When we hand data to Kafka, how does Kafka store it?
When we request data back, how does Kafka return it?
Message Storage (Logical Layer)
Most data systems store data on disk using either an append‑log or B‑tree format. Kafka adopts an append‑log format, as shown below.
Append‑log improves write performance but is less friendly for reads; Kafka adds extra mechanisms to boost read performance.
(一)Topic+Partition Two‑Level Structure
Kafka classifies messages by topic and partition. Partitioning provides scalability and reliability; partitions can be placed on different brokers.
(二)Offset
When appending to a log, each new message is added to the end of a file. Each message in a partition receives a unique offset, which is used for ordering within the partition.
Key characteristics:
Partitions improve write performance and data reliability.
Messages are ordered within a partition but not across partitions.
How to Write Data
(一)Overall Flow
The producer’s overall flow is illustrated below.
Two main threads exist on the producer side: the main thread and the sender thread, which communicate via a shared RecordAccumulator.
KafkaProducer creates a record.
Producer interceptors can filter or modify the record before sending.
The serializer converts the record to a byte array.
The partitioner determines the target partition and stores the record in the RecordAccumulator.
The sender thread fetches records from the accumulator and creates requests.
If there are many requests, some are buffered.
Buffered requests are sent to the Kafka cluster.
The producer receives responses.
Data is cleaned up.
The RecordAccumulator buffers data; its size is controlled by buffer.memory (default 32 MB). Records are grouped into ProducerBatch objects (size controlled by batch.size , default 1 MB). The linger.ms setting controls how long the producer waits to fill a batch.
When the buffer is full, the producer may block or throw an exception, controlled by max.block.ms (default 60 s).
Requests are grouped per broker; the number of in‑flight requests per connection is limited by max.in.flight.requests.per.connection (default 5), and request size by max.request.size (default 1 MB).
(二)Sending Modes
Fire‑and‑forget: send without waiting for a result (highest throughput, lowest reliability).
Synchronous: wait for the cluster to acknowledge the write (highest reliability, lowest throughput).
Asynchronous: provide a callback that is invoked when the broker responds.
The acks parameter determines when a write is considered successful:
acks=1 (default): the leader’s write is enough.
acks=0 : the producer does not wait for any acknowledgment (possible data loss).
acks=-1 or acks=all : all in‑sync replicas must acknowledge (strongest durability).
(三)Important Producer Parameters
How to Read Messages
(一)Consuming Messages
Kafka uses a pull‑based (pull) model. Consumers repeatedly call poll() to fetch records.
while (true) {
records := consumer.poll();
for record := range records {
// process record
}
}(二)Offset Commit
Each consumed message has an offset. After processing, the consumer must commit the offset so that the next poll starts from the next message. For example, if the last processed offset is 9527, the committed offset becomes 9528.
Kafka’s default is automatic offset commit every 5 seconds (controlled by enable.auto.commit and auto.commit.interval.ms ), which can cause duplicate consumption or data loss if a consumer crashes before processing the polled batch.
(三)Partition Assignment Strategies
Consumers in a group are assigned partitions based on the partition.assignment.strategy configuration. Common strategies:
Range: assigns partitions based on topic‑wise ranges (may be uneven).
RoundRobin: distributes partitions across consumers in a round‑robin fashion.
Sticky: tries to keep previous assignments while balancing load.
(四)Rebalancing
Rebalancing occurs when consumers join or leave a group, when the group coordinator changes, or when topic/partition counts change. The steps are:
FindCoordinator – locate the group coordinator.
JoinGroup – request to join the group.
SyncGroup – receive the partition assignment.
Heartbeat – maintain membership.
Message Storage (Physical Layer)
(一)Log Files
Kafka stores data as an append‑only log. The log is divided into segments ( LogSegment ) to allow efficient cleanup.
Two log‑cleanup policies exist:
Log Retention – delete old segments based on time or size.
Log Compaction – keep only the latest record for each key.
(二)Log Indexes
To speed up reads, Kafka maintains two sparse indexes per segment:
Offset Index – maps message offsets to file positions.
Timestamp Index – maps timestamps to offsets, enabling time‑based lookups.
When an offset is not directly indexed, a binary search is performed using the nearest indexed entry.
(三)Zero‑Copy
Kafka uses zero‑copy I/O to transfer data from disk to the network directly in kernel space, reducing user‑kernel context switches.
Non‑zero‑copy flow (for illustration):
Zero‑copy flow:
Kafka Reliability
(一)Key Concepts
For a single partition, the following sets are defined:
AR (Assigned Replicas): all replicas of the partition.
ISR (In‑Sync Replicas): subset of AR that are fully caught up with the leader.
OSR (Out‑of‑Sync Replicas): AR – ISR.
LEO (Log End Offset): the offset of the next message to be written; each replica has its own LEO.
HW (High Watermark): the smallest LEO among the ISR; messages up to HW are considered committed and can be consumed.
(二)Leader HW and LEO Updates
The leader receives writes, updates its LEO, and replicates data to followers. Followers request data, receive the leader’s HW, and update their own HW as the minimum of their LEO and the received HW. Only after a follower’s next fetch does the leader’s HW advance, ensuring that a message is considered committed only after being written to all ISR replicas.
(三)Leader Epoch
Each time a new leader is elected, its epoch (a version number) increments. Followers include the epoch in replication requests, preventing them from truncating logs after a leader change and avoiding data loss.
In summary, this article uses simple language and diagrams to explain Kafka’s essential concepts, from architecture and data flow to storage mechanisms and reliability guarantees. Kafka is a complex system that requires further study for deep mastery.
FunTester
10k followers, 1k articles | completely useless
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.