Big Data 18 min read

Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall architecture, key components such as brokers, producers, consumers, topics, partitions, replicas, and ZooKeeper, as well as logical and physical storage mechanisms, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, and the replication model that ensures data reliability.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

Kafka Overall Architecture

Kafka decouples systems, smooths traffic spikes, and enables asynchronous communication, making it ideal for activity tracking, messaging, metrics, logging, and stream processing.

Kafka overall structure
Kafka overall structure

Key Components

Broker : A Kafka instance; multiple brokers form a cluster.

Producer : Writes messages to brokers.

Consumer : Reads messages from brokers.

Consumer Group : One or more consumers that share a subscription to a topic.

ZooKeeper : Manages cluster metadata and controller election.

Topic : Logical categorization of messages.

Partition : Subdivision of a topic for scalability and fault tolerance.

Replica : Copies of a partition for durability.

Leader and Follower : Leader handles reads/writes; followers replicate the leader.

Offset : Unique position of a message within a partition.

Logical Storage Model

Kafka stores data as an append‑only log, improving write performance. Each partition consists of multiple log segments, enabling efficient cleanup. Offsets guarantee ordering within a partition but not across partitions.

Kafka logical storage
Kafka logical storage

Writing Data (Producer Flow)

The producer workflow includes creating a record, applying interceptors, serialization, partition selection, batching in RecordAccumulator , sending requests, handling back‑pressure, and cleaning up resources. Important parameters include buffer.memory , batch.size , linger.ms , and max.block.ms .

Producer workflow
Producer workflow

Send Modes

Fire‑and‑forget (lowest latency, lowest reliability).

Sync (wait for broker acknowledgment, highest reliability).

Async (callback after acknowledgment).

Acknowledgment Settings (acks)

acks=1 (default): Leader acknowledgment.

acks=0 : No acknowledgment, possible data loss.

acks=all or -1 : All in‑sync replicas must acknowledge.

Reading Data (Consumer Flow)

Consumers use a pull‑based model, repeatedly invoking poll() to fetch records.

<code>while (true) {</code><code>    records := consumer.poll();</code><code>    for record := range records {</code><code>        // process record</code><code>    }</code><code>}</code>

Offset Commit

After processing, consumers commit the next offset (e.g., 9528 after processing up to 9527 ). Automatic commits can cause duplicate processing or data loss if a consumer crashes before committing.

Offset commit illustration
Offset commit illustration

Partition Assignment Strategies

Range: Assigns contiguous partitions per consumer.

RoundRobin: Distributes partitions evenly in a round‑robin fashion.

Sticky: Tries to keep previous assignments while balancing load.

Rebalancing

Triggered by consumer joins/leaves, group coordinator changes, or topic/partition count changes. Steps: FindCoordinator → JoinGroup → SyncGroup → Heartbeat.

Rebalancing process
Rebalancing process

Physical Storage

Log Files and Segments

Data is stored in append‑only log files split into segments. Retention policies include time‑based deletion (default 7 days) and size‑based deletion (default 1 GB). Compaction retains only the latest value for each key.

Log segments
Log segments

Indexes

Kafka maintains a sparse offset index and a timestamp index to locate messages quickly without scanning entire logs.

Offset index
Offset index
Timestamp index
Timestamp index

Zero‑Copy Transfer

Kafka uses zero‑copy to move data from disk to network directly in kernel space, reducing CPU overhead and latency.

Non‑zero‑copy
Non‑zero‑copy
Zero‑copy
Zero‑copy

Reliability and Replication

Each partition has a set of replicas (AR). The in‑sync replica set (ISR) contains replicas that have caught up to the leader. Out‑of‑sync replicas (OSR) are lagging. The leader’s Log End Offset (LEO) marks the next write position; the High Watermark (HW) is the smallest LEO among ISR replicas, indicating the offset up to which all ISR replicas have persisted data and can be safely consumed.

Replication diagram
Replication diagram

Leader Epoch

Leader epoch is a monotonically increasing version number for the leader. Followers include the epoch in sync requests, preventing log truncation after leader changes and avoiding data loss.

Leader epoch
Leader epoch

This concise guide introduces Kafka’s essential concepts, architecture, storage layers, producer and consumer mechanics, configuration knobs, and reliability guarantees, providing a solid foundation for deeper exploration.

distributed systemsMessage QueuesKafkareliabilitydata streaming
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.