Big Data 18 min read

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower mechanics, logical and physical storage models, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, log retention and compaction, indexing, zero‑copy transmission, and the reliability concepts that ensure data durability.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
Understanding Kafka: Core Architecture, Storage, and Reliability Explained

Kafka Overall Structure

Kafka can decouple systems, smooth traffic spikes, and provide asynchronous communication between services. It is suitable for activity tracking, messaging, metrics, logging, and stream processing.

1) Broker

A broker is a Kafka instance; multiple brokers form a Kafka cluster.

2) Producer

The producer writes messages to brokers.

3) Consumer

The consumer reads messages from brokers.

4) Consumer Group

A consumer group consists of one or more consumers; different groups can subscribe to the same topic independently.

5) ZooKeeper

ZooKeeper manages cluster metadata and controller election.

6) Topic

A topic is a logical classification for messages.

7) Partition

Each topic can be split into multiple partitions; a partition belongs to a single leader.

8) Replica

Partitions can have multiple replicas for fault tolerance.

9) Leader and Follower

Kafka uses a leader‑follower model for replication; the leader handles reads/writes while followers act as backups.

10) Offset

Each message in a partition has a unique offset that preserves order within the partition.

Kafka, as a data system, must solve two fundamental problems: how to store incoming data and how to retrieve it.

How does Kafka store data?

How does Kafka return data?

Message Storage (Logical Layer)

Kafka stores messages using an append‑only log format, which improves write performance. The logical model consists of a two‑level hierarchy: topics and partitions. Partitioning provides scalability and reliability, and partitions can be placed on different brokers.

Message Storage (Physical Layer)

Data is stored in log segments. Kafka supports two cleanup strategies: log retention (deleting old segments) and log compaction (keeping only the latest value for each key).

Log Deletion

Segments are deleted based on age (default 7 days) or size (default 1 GB).

Log Compaction

Compaction retains the newest record for each key and merges small files.

Log Indexes

Kafka maintains a sparse offset index and a timestamp index to locate messages efficiently.

Offset Index

Maps message offsets to file positions; a binary search is used for offsets without a direct entry.

Timestamp Index

Maps timestamps to offsets, allowing time‑based lookups.

Zero‑Copy Transfer

Kafka uses zero‑copy to send data directly from disk to the network interface, reducing CPU overhead.

How to Write Data

The producer workflow includes creating a KafkaProducer, applying interceptors, serializing the record, determining the target partition, buffering in RecordAccumulator, creating a request, sending it, handling responses, and cleaning up. Records are batched in ProducerBatch (default 1 MB) and may be delayed by linger.ms to improve throughput. Buffer size is controlled by buffer.memory , and blocking behavior by max.block.ms . Requests are grouped per broker, limited by max.in.flight.requests.per.connection and max.request.size .

Sending Modes

Fire‑and‑forget: fastest, lowest reliability.

Synchronous (acks=1/0/all): waits for broker acknowledgment; higher reliability reduces performance.

Asynchronous with callback: non‑blocking confirmation.

Important Producer Parameters

Key parameters include acks , linger.ms , batch.size , buffer.memory , max.block.ms , and max.in.flight.requests.per.connection .

How to Read Data

Consumers use a pull model by repeatedly calling poll() to fetch records.

Offset Commit

After processing, consumers commit the next offset (e.g., offset 9528 after processing offset 9527). Auto‑commit can cause duplicate processing or data loss if a consumer crashes before committing.

Partition Assignment Strategies

Range: assigns contiguous partition ranges to consumers.

RoundRobin: distributes partitions evenly in a round‑robin fashion.

Sticky: aims for balanced and stable assignments.

Rebalancing

Triggered by consumer joins/leaves, group coordinator changes, or topic/partition count changes. Steps: FindCoordinator → JoinGroup → SyncGroup → Heartbeat.

Kafka Reliability

Reliability is achieved through replication. Key concepts per partition:

AR (Assigned Replicas): all replicas.

ISR (In‑Sync Replicas): replicas that are up‑to‑date with the leader.

OSR (Out‑of‑Sync Replicas): replicas lagging behind.

LEO (Log End Offset): the next offset to be written on each replica.

HW (High Watermark): the smallest LEO among ISR, indicating the offset up to which all ISR have persisted data and can be consumed.

Leader updates HW only after a follower reports its LEO, ensuring data is replicated before being visible to consumers.

Leader Epoch

Each leader election increments a leader epoch. Followers include the epoch in sync requests, preventing log truncation after failures and avoiding data loss.

Overall, this article provides a concise yet thorough introduction to Kafka’s essential concepts, architecture, data flow, storage mechanisms, and reliability guarantees.

Distributed SystemsarchitectureKafkaMessage Queuereliabilitydata streaming
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.