Big Data 22 min read

Why Kafka Stores Data the Way It Does: A Deep Dive into Its Log Architecture

This article thoroughly examines Kafka's storage system, explaining why it uses sequential log writes combined with sparse indexing, how different log formats evolved, and the mechanisms for log retention and compaction that enable high‑throughput, fault‑tolerant streaming at massive scale.

dbaplus Community
dbaplus Community
dbaplus Community
Why Kafka Stores Data the Way It Does: A Deep Dive into Its Log Architecture

1. Kafka Storage Scenario Analysis

Kafka is an open‑source distributed event streaming platform originally created at LinkedIn to handle real‑time log streams at the scale of hundreds of billions of events per day. Its storage must support massive data volume, high concurrency, high availability, and high performance.

Real‑time data generation

Massive data storage and processing, requiring the classic "three‑high" challenges of distributed systems

From this background we derive the core storage requirements:

Store only the message stream (Kafka does not care about the payload format).

Provide efficient, durable storage that survives broker restarts.

Enable fast retrieval by offset or timestamp.

Guarantee data safety, stability, and fault‑tolerant failover.

2. Kafka Storage Options

Kafka cannot simply reuse a relational database such as MySQL because the write‑intensive workload would be overwhelmed by index maintenance and page‑splitting overhead. Instead, Kafka adopts a log‑structured storage model.

2.1 Basic Storage Knowledge

Disk I/O is slower for random reads than sequential writes, while memory offers faster random access. Benchmarks show a typical HDD sequential I/O of 53.2 M values/s versus memory random I/O of 36.7 M values/s, leading to the conclusion that sequential disk writes can outperform random memory reads for large batches.

Two fundamental approaches exist:

Improve read speed – use indexes, which degrade write throughput.

Improve write speed – use append‑only logs without indexes, sacrificing random read efficiency.

2.2 Kafka Storage Design

Kafka stores messages as an append‑only log and builds a sparse index to locate messages efficiently. Offsets are ordered, allowing each log segment to be searched via binary‑like lookup: first find the segment whose base offset is ≤ target offset, then scan sequentially within that segment.

Kafka sparse index diagram
Kafka sparse index diagram

3. Kafka Storage Architecture Design

The final design combines sequential log writes with sparse hash indexing. The log structure consists of topic → partition → replica → segment → index :

Messages are grouped by logical topic , physically stored per partition .

Partitions enable horizontal scaling and fault tolerance.

Each partition is split into multiple LogSegment files to keep individual files manageable.

Every segment contains a .log file and associated index files ( .index, .timeindex, optional .snapshot, etc.). Files are named with a 20‑digit base offset (e.g., 00000000000000000000.log).

Log directory layout
Log directory layout

4. Kafka Log System Architecture

Kafka provides three log‑cleaning strategies controlled by broker parameters:

Log Retention (deletion) – removes segments older than a time or size threshold.

Log Compaction – retains only the latest record for each key.

Retention can be based on:

Time ( log.retention.ms, log.retention.hours, etc.).

Size ( log.retention.bytes).

Start offset ( log.start.offset).

Deletion steps include removing segment references from the log’s skip‑list, renaming files with a .deleted suffix, and finally deleting them via a delayed task ( file.delete.delay.ms).

Time‑based retention diagram
Time‑based retention diagram

Compaction works similarly to Redis RDB snapshots: only the most recent value for each key is kept, enabling fast recovery of the latest state.

Log compaction diagram
Log compaction diagram

5. Disk I/O Optimizations

Kafka relies on the operating system's page cache to turn disk I/O into memory access, dramatically increasing throughput. Writes are appended to the active segment; when it reaches a size limit, a new active segment is created.

Zero‑copy techniques are also employed to avoid unnecessary data copying between user space and kernel space, further boosting performance.

Zero‑copy illustration
Zero‑copy illustration

6. Summary

Starting from the real‑time log streaming scenario, we examined Kafka's storage requirements, compared possible storage mechanisms, and arrived at the final design: sequential append‑only logs combined with sparse indexing. The article also covered log segment layout, format evolution (V0, V1, V2), and the three log‑cleaning strategies that keep Kafka performant and reliable at massive scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig DataKafkalog storageLog CompactionSparse IndexLog Retention
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.