How Kafka Achieves High‑Performance Storage: Log Segments, Indexes, and Retention
This article explains Kafka's storage architecture, including its partition‑based log files, sequential append writes, log segment management, index files, and configurable time‑ and size‑based retention policies that together enable ultra‑high write throughput while controlling disk usage.
Kafka is a critical middleware for large‑scale architectures, and its high‑performance storage is key to supporting over 100,000 writes per second.
The storage model is built on log files organized by partition: each topic is divided into partitions, and each partition maps to a directory on disk containing all its messages.
Messages are not stored in a single massive file; they are split into multiple small log segment files, each with a .log data file, a .index sparse offset‑to‑physical‑address index, and a .timeindex timestamp‑to‑offset index.
<ol><li>/kafka-logs/</li><li>└──my-topic-0/</li><li>├──00000000000000000000.log</li><li>├──00000000000000000000.index</li><li>├──00000000000000000000.timeindex</li><li>├──…</li></ol>The .log file is written sequentially, which is one of the fastest disk I/O patterns because it avoids seek time and fully utilizes write bandwidth.
Write flow:
Broker receives the producer request.
After validation, data is written to an in‑memory buffer.
Batch data is appended to the active .log segment.
Index files are updated at configured intervals.
Asynchronous flush to disk occurs based on flush.messages or flush.ms.
Kafka manages log segments instead of a single file, which simplifies cleanup and improves efficiency.
Retention policies prevent unlimited disk growth. A time‑based policy deletes inactive segments older than a configured period (e.g., 7 days), while a size‑based policy removes the oldest inactive segments until the total size stays below a configured limit.
Through sequential appends, segment management, and flexible expiration, Kafka maintains high write throughput while controlling storage usage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
