How Kafka’s File Storage Mechanism Achieves High Performance

Kafka’s distributed log architecture stores messages in partitioned segments with indexed data files, enabling efficient sequential writes, rapid deletions, and fast offset-based lookups, as detailed through its broker, topic, partition, segment structures, file naming rules, and real‑world performance experiments.

21CTO
21CTO
21CTO
How Kafka’s File Storage Mechanism Achieves High Performance

What is Kafka

Kafka was originally developed by LinkedIn, a distributed, partitioned, replicated, multi‑subscriber log system coordinated by Zookeeper, usable as a message queue for web/nginx logs, access logs, messaging services, etc. LinkedIn contributed it to Apache in 2010.

1. Introduction

A commercial message queue’s performance heavily depends on its file storage design, a key technical metric.

The following sections analyze Kafka’s file storage mechanism and physical structure, explaining how it achieves efficient storage and its practical effects.

2. Kafka File Storage Mechanism

Key terminology:

Broker: a Kafka node; multiple brokers form a cluster.

Topic: a category of messages such as page view or click logs.

Partition: a physical grouping of a topic; each partition is an ordered queue.

Segment: a partition consists of multiple segment files.

Offset: a sequential identifier for each message within a partition.

The analysis proceeds in four steps:

Topic‑level partition distribution

Partition file storage method

Segment file structure within a partition

Locating a message by offset

2.1 Topic Partition Distribution

Assume a single‑broker cluster with log.dirs set to xxx/message-folder. Creating two topics (report_push, launch_info) each with 4 partitions yields the following directory layout:

|--report_push-0
|--report_push-1
|--report_push-2
|--report_push-3
|--launch_info-0
|--launch_info-1
|--launch_info-2
|--launch_info-3

Each partition is a directory named {topic}-{index}, starting from 0.

2.2 Partition File Storage Method

Each partition directory contains multiple equal‑size segment files; the number of messages per segment may vary, allowing old segments to be deleted quickly. Segments support sequential read/write, and their lifecycle is controlled by server configuration, enabling rapid removal of unused files and better disk utilization.

Partition storage diagram
Partition storage diagram

2.3 Segment File Structure

Each segment consists of an index file and a data file, with extensions .index and .log respectively. Segment filenames start at 0 and subsequent files are named after the last message’s offset, padded to 19 digits.

Example segment list from an experiment (one topic, one partition, 500 MB segment size):

Segment file list
Segment file list

The index file stores metadata pointing to the physical offset of each message in the data file. For example, metadata entry 3,497 corresponds to the third message (global offset 368,772) with a physical offset of 497.

Message physical structure:

Message layout
Message layout

Key fields include:

8 byte offset: Sequential ID of the message within the partition.

4 byte message size: Size of the message.

4 byte CRC32: Checksum for integrity.

1 byte “magic”: Protocol version.

1 byte “attributes”: Compression or encoding flags.

4 byte key length: Length of the key; -1 indicates no key.

K byte key: Optional key.

value bytes payload: Actual message payload.

2.4 Locating a Message by Offset

To read a message at offset 368,776:

Binary search the segment files to find the file whose start offset ≤ 368,776. In the example, this is 00000000000000368769.index and its corresponding .log.

Read the index file to obtain the physical position of the target message, then scan the data file sequentially until the desired offset is reached.

The sparse index reduces index size and, combined with mmap, allows direct memory access, though it may increase lookup time compared to dense indexes.

3. Real‑World Performance

Test environment: a two‑VM Kafka cluster (4‑core CPU, 8 GB RAM, 1 Gbps NIC, JVM heap 4 GB). Detailed server configuration is referenced elsewhere.

Observations show Kafka performs few large disk reads; most operations are batch writes to disk, making I/O efficient. Write path: Java heap → page cache → asynchronous flush to disk. Read path: page cache → socket; if cache miss, data is loaded from disk into cache before sending.

4. Summary

Kafka splits large partition files into many small segment files, facilitating easy deletion of consumed data and reducing disk usage.

Index information enables fast message location and response size estimation.

Mapping index metadata into memory avoids disk I/O for segment files.

Sparse index storage significantly reduces index file size.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Kafkafile storage
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.