How Kafka’s File Storage Mechanism Achieves High Performance
Kafka’s distributed log architecture stores messages in partitioned segments with indexed data files, enabling efficient sequential writes, rapid deletions, and fast offset-based lookups, as detailed through its broker, topic, partition, segment structures, file naming rules, and real‑world performance experiments.
What is Kafka
Kafka was originally developed by LinkedIn, a distributed, partitioned, replicated, multi‑subscriber log system coordinated by Zookeeper, usable as a message queue for web/nginx logs, access logs, messaging services, etc. LinkedIn contributed it to Apache in 2010.
1. Introduction
A commercial message queue’s performance heavily depends on its file storage design, a key technical metric.
The following sections analyze Kafka’s file storage mechanism and physical structure, explaining how it achieves efficient storage and its practical effects.
2. Kafka File Storage Mechanism
Key terminology:
Broker: a Kafka node; multiple brokers form a cluster.
Topic: a category of messages such as page view or click logs.
Partition: a physical grouping of a topic; each partition is an ordered queue.
Segment: a partition consists of multiple segment files.
Offset: a sequential identifier for each message within a partition.
The analysis proceeds in four steps:
Topic‑level partition distribution
Partition file storage method
Segment file structure within a partition
Locating a message by offset
2.1 Topic Partition Distribution
Assume a single‑broker cluster with log.dirs set to xxx/message-folder. Creating two topics (report_push, launch_info) each with 4 partitions yields the following directory layout:
|--report_push-0
|--report_push-1
|--report_push-2
|--report_push-3
|--launch_info-0
|--launch_info-1
|--launch_info-2
|--launch_info-3Each partition is a directory named {topic}-{index}, starting from 0.
2.2 Partition File Storage Method
Each partition directory contains multiple equal‑size segment files; the number of messages per segment may vary, allowing old segments to be deleted quickly. Segments support sequential read/write, and their lifecycle is controlled by server configuration, enabling rapid removal of unused files and better disk utilization.
2.3 Segment File Structure
Each segment consists of an index file and a data file, with extensions .index and .log respectively. Segment filenames start at 0 and subsequent files are named after the last message’s offset, padded to 19 digits.
Example segment list from an experiment (one topic, one partition, 500 MB segment size):
The index file stores metadata pointing to the physical offset of each message in the data file. For example, metadata entry 3,497 corresponds to the third message (global offset 368,772) with a physical offset of 497.
Message physical structure:
Key fields include:
8 byte offset: Sequential ID of the message within the partition.
4 byte message size: Size of the message.
4 byte CRC32: Checksum for integrity.
1 byte “magic”: Protocol version.
1 byte “attributes”: Compression or encoding flags.
4 byte key length: Length of the key; -1 indicates no key.
K byte key: Optional key.
value bytes payload: Actual message payload.
2.4 Locating a Message by Offset
To read a message at offset 368,776:
Binary search the segment files to find the file whose start offset ≤ 368,776. In the example, this is 00000000000000368769.index and its corresponding .log.
Read the index file to obtain the physical position of the target message, then scan the data file sequentially until the desired offset is reached.
The sparse index reduces index size and, combined with mmap, allows direct memory access, though it may increase lookup time compared to dense indexes.
3. Real‑World Performance
Test environment: a two‑VM Kafka cluster (4‑core CPU, 8 GB RAM, 1 Gbps NIC, JVM heap 4 GB). Detailed server configuration is referenced elsewhere.
Observations show Kafka performs few large disk reads; most operations are batch writes to disk, making I/O efficient. Write path: Java heap → page cache → asynchronous flush to disk. Read path: page cache → socket; if cache miss, data is loaded from disk into cache before sending.
4. Summary
Kafka splits large partition files into many small segment files, facilitating easy deletion of consumed data and reducing disk usage.
Index information enables fast message location and response size estimation.
Mapping index metadata into memory avoids disk I/O for segment files.
Sparse index storage significantly reduces index file size.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
