How Kafka Achieves High‑Performance Storage: Log Segments, Indexes, and Retention

This article explains Kafka's storage architecture, including its partition‑based log files, sequential append writes, log segment management, index files, and configurable time‑ and size‑based retention policies that together enable ultra‑high write throughput while controlling disk usage.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
How Kafka Achieves High‑Performance Storage: Log Segments, Indexes, and Retention

Kafka is a critical middleware for large‑scale architectures, and its high‑performance storage is key to supporting over 100,000 writes per second.

The storage model is built on log files organized by partition: each topic is divided into partitions, and each partition maps to a directory on disk containing all its messages.

Messages are not stored in a single massive file; they are split into multiple small log segment files, each with a .log data file, a .index sparse offset‑to‑physical‑address index, and a .timeindex timestamp‑to‑offset index.

<ol><li>/kafka-logs/</li><li>└──my-topic-0/</li><li>├──00000000000000000000.log</li><li>├──00000000000000000000.index</li><li>├──00000000000000000000.timeindex</li><li>├──…</li></ol>

The .log file is written sequentially, which is one of the fastest disk I/O patterns because it avoids seek time and fully utilizes write bandwidth.

Write flow:

Broker receives the producer request.

After validation, data is written to an in‑memory buffer.

Batch data is appended to the active .log segment.

Index files are updated at configured intervals.

Asynchronous flush to disk occurs based on flush.messages or flush.ms.

Kafka manages log segments instead of a single file, which simplifies cleanup and improves efficiency.

Retention policies prevent unlimited disk growth. A time‑based policy deletes inactive segments older than a configured period (e.g., 7 days), while a size‑based policy removes the oldest inactive segments until the total size stays below a configured limit.

Through sequential appends, segment management, and flexible expiration, Kafka maintains high write throughput while controlling storage usage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaStoragedistributed-systemsSequential Writelog-segmentRetention Policy
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.