Big Data 22 min read

Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability

This article provides a comprehensive overview of Kafka's architecture and fundamental concepts, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower synchronization, offset handling, message storage at both logical and physical layers, as well as producer and consumer workflows, partition assignment strategies, rebalancing, log management, zero‑copy I/O, and reliability mechanisms.

FunTester
FunTester
FunTester
Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability

Kafka can decouple systems, smooth traffic spikes, buffer data, and enable asynchronous communication, making it ideal for activity tracking, messaging, metrics, logging, and stream processing. This article introduces Kafka's basic concepts.

Kafka Overall Structure

The following diagram shows many details of Kafka; you can ignore it for now.

The diagram displays several important Kafka components, which will be introduced one by one.

(一)Broker

A broker is a service node; multiple brokers form a Kafka cluster.

(二)Producer

A producer writes messages to the broker.

(三)Consumer

A consumer reads messages from the broker.

(四)Consumer Group

A consumer group consists of one or more consumers; different groups can subscribe to the same topic independently.

(五)ZooKeeper

Kafka uses ZooKeeper to manage cluster metadata and controller election.

(六)Topic

Each message belongs to a topic; topics logically categorize messages.

(七)Partition

A topic can be split into multiple partitions; each partition belongs to a single leader.

(八)Replica

Each partition can have multiple replicas for fault tolerance.

(九)Leader and Follower

Kafka uses a leader‑follower model for replication: the leader handles reads/writes, followers act as backups.

(十)Offset

Each message in a partition has a unique offset that identifies its position and guarantees order within the partition.

In summary, Kafka, as a data system, must solve two fundamental problems: how to store data and how to retrieve it.

When we hand data to Kafka, how does Kafka store it?

When we request data back, how does Kafka return it?

Message Storage (Logical Layer)

Most data systems store data on disk using either an append‑log or B‑tree format. Kafka adopts an append‑log format, as shown below.

Append‑log improves write performance but is less friendly for reads; Kafka adds extra mechanisms to boost read performance.

(一)Topic+Partition Two‑Level Structure

Kafka classifies messages by topic and partition. Partitioning provides scalability and reliability; partitions can be placed on different brokers.

(二)Offset

When appending to a log, each new message is added to the end of a file. Each message in a partition receives a unique offset, which is used for ordering within the partition.

Key characteristics:

Partitions improve write performance and data reliability.

Messages are ordered within a partition but not across partitions.

How to Write Data

(一)Overall Flow

The producer’s overall flow is illustrated below.

Two main threads exist on the producer side: the main thread and the sender thread, which communicate via a shared RecordAccumulator.

KafkaProducer creates a record.

Producer interceptors can filter or modify the record before sending.

The serializer converts the record to a byte array.

The partitioner determines the target partition and stores the record in the RecordAccumulator.

The sender thread fetches records from the accumulator and creates requests.

If there are many requests, some are buffered.

Buffered requests are sent to the Kafka cluster.

The producer receives responses.

Data is cleaned up.

The RecordAccumulator buffers data; its size is controlled by buffer.memory (default 32 MB). Records are grouped into ProducerBatch objects (size controlled by batch.size , default 1 MB). The linger.ms setting controls how long the producer waits to fill a batch.

When the buffer is full, the producer may block or throw an exception, controlled by max.block.ms (default 60 s).

Requests are grouped per broker; the number of in‑flight requests per connection is limited by max.in.flight.requests.per.connection (default 5), and request size by max.request.size (default 1 MB).

(二)Sending Modes

Fire‑and‑forget: send without waiting for a result (highest throughput, lowest reliability).

Synchronous: wait for the cluster to acknowledge the write (highest reliability, lowest throughput).

Asynchronous: provide a callback that is invoked when the broker responds.

The acks parameter determines when a write is considered successful:

acks=1 (default): the leader’s write is enough.

acks=0 : the producer does not wait for any acknowledgment (possible data loss).

acks=-1 or acks=all : all in‑sync replicas must acknowledge (strongest durability).

(三)Important Producer Parameters

How to Read Messages

(一)Consuming Messages

Kafka uses a pull‑based (pull) model. Consumers repeatedly call poll() to fetch records.

while (true) {
records := consumer.poll();
for record := range records {
// process record
}
}

(二)Offset Commit

Each consumed message has an offset. After processing, the consumer must commit the offset so that the next poll starts from the next message. For example, if the last processed offset is 9527, the committed offset becomes 9528.

Kafka’s default is automatic offset commit every 5 seconds (controlled by enable.auto.commit and auto.commit.interval.ms ), which can cause duplicate consumption or data loss if a consumer crashes before processing the polled batch.

(三)Partition Assignment Strategies

Consumers in a group are assigned partitions based on the partition.assignment.strategy configuration. Common strategies:

Range: assigns partitions based on topic‑wise ranges (may be uneven).

RoundRobin: distributes partitions across consumers in a round‑robin fashion.

Sticky: tries to keep previous assignments while balancing load.

(四)Rebalancing

Rebalancing occurs when consumers join or leave a group, when the group coordinator changes, or when topic/partition counts change. The steps are:

FindCoordinator – locate the group coordinator.

JoinGroup – request to join the group.

SyncGroup – receive the partition assignment.

Heartbeat – maintain membership.

Message Storage (Physical Layer)

(一)Log Files

Kafka stores data as an append‑only log. The log is divided into segments ( LogSegment ) to allow efficient cleanup.

Two log‑cleanup policies exist:

Log Retention – delete old segments based on time or size.

Log Compaction – keep only the latest record for each key.

(二)Log Indexes

To speed up reads, Kafka maintains two sparse indexes per segment:

Offset Index – maps message offsets to file positions.

Timestamp Index – maps timestamps to offsets, enabling time‑based lookups.

When an offset is not directly indexed, a binary search is performed using the nearest indexed entry.

(三)Zero‑Copy

Kafka uses zero‑copy I/O to transfer data from disk to the network directly in kernel space, reducing user‑kernel context switches.

Non‑zero‑copy flow (for illustration):

Zero‑copy flow:

Kafka Reliability

(一)Key Concepts

For a single partition, the following sets are defined:

AR (Assigned Replicas): all replicas of the partition.

ISR (In‑Sync Replicas): subset of AR that are fully caught up with the leader.

OSR (Out‑of‑Sync Replicas): AR – ISR.

LEO (Log End Offset): the offset of the next message to be written; each replica has its own LEO.

HW (High Watermark): the smallest LEO among the ISR; messages up to HW are considered committed and can be consumed.

(二)Leader HW and LEO Updates

The leader receives writes, updates its LEO, and replicates data to followers. Followers request data, receive the leader’s HW, and update their own HW as the minimum of their LEO and the received HW. Only after a follower’s next fetch does the leader’s HW advance, ensuring that a message is considered committed only after being written to all ISR replicas.

(三)Leader Epoch

Each time a new leader is elected, its epoch (a version number) increments. Followers include the epoch in replication requests, preventing them from truncating logs after a leader change and avoiding data loss.

In summary, this article uses simple language and diagrams to explain Kafka’s essential concepts, from architecture and data flow to storage mechanisms and reliability guarantees. Kafka is a complex system that requires further study for deep mastery.

Distributed SystemsMessage QueuesStreamingKafkadata replicationLog Managementconsumer-groups
FunTester
Written by

FunTester

10k followers, 1k articles | completely useless

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.