How Kafka Stores and Retrieves Messages: Inside Partitions, Segments, and Index Files
Kafka persists messages on disk by organizing each topic into multiple partitions, which are further divided into segment files containing paired .index and .log files; this structure enables efficient storage, offset-based lookup, and fast retrieval of specific messages through binary search across segment indexes.
Kafka is a distributed message queue system that stores messages on the hard disks of cluster servers.
In Kafka, multiple message queues called topics can be created; producers publish messages to a topic and consumers read from it.
To handle massive message volumes and ensure read/write performance, each topic is split into multiple partitions , which are evenly distributed across the servers in the cluster.
Thus, logically, a producer sends a message to a specific partition of a topic, and a consumer fetches messages from that partition.
In the actual storage layout, a partition is not a single physical file but a directory named with the topic and partition number, containing several segment files.
Segments break the large amount of data into smaller files to facilitate writing and retrieval.
Each segment consists of two physical files: a .index file and a .log file. The two files share the same base name, differing only by extension.
Messages are generated sequentially, each assigned an offset starting from 0, indicating its position within a partition. Each segment stores messages whose offsets fall within a specific range.
Segment files are named with the starting offset of their range, padded to 20 digits (e.g., 00000000000000000000.index and 00000000000000000000.log for offsets 0‑19).
The .index file is simple: each line contains a key,value pair where the key is the message offset and the value is the physical byte position of that message in the corresponding log file.
1,0
3,299
6,497
...The .log file stores the actual message payload along with metadata such as offset, size, checksum, etc.
Message Retrieval Example
To read the message with offset=368:
Identify the segment containing the offset by listing all segment files in the partition directory and performing a binary search on their offset ranges.
Locate the corresponding .index file (e.g., 00000000000000000300.index) and read the line with key 368 to obtain the byte offset (e.g., 299).
Open the matching .log file (e.g., 00000000000000000300.log) and read from byte position 299 to retrieve the message content.
This process completes the message lookup.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
