How Kafka Stores and Retrieves Messages: Inside Partitions, Segments, and Index Files

Kafka persists messages on disk by organizing each topic into multiple partitions, which are further divided into segment files containing paired .index and .log files; this structure enables efficient storage, offset-based lookup, and fast retrieval of specific messages through binary search across segment indexes.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Kafka Stores and Retrieves Messages: Inside Partitions, Segments, and Index Files

Kafka is a distributed message queue system that stores messages on the hard disks of cluster servers.

In Kafka, multiple message queues called topics can be created; producers publish messages to a topic and consumers read from it.

To handle massive message volumes and ensure read/write performance, each topic is split into multiple partitions , which are evenly distributed across the servers in the cluster.

Thus, logically, a producer sends a message to a specific partition of a topic, and a consumer fetches messages from that partition.

Kafka topic and partition distribution
Kafka topic and partition distribution

In the actual storage layout, a partition is not a single physical file but a directory named with the topic and partition number, containing several segment files.

Segments break the large amount of data into smaller files to facilitate writing and retrieval.

Kafka segment files
Kafka segment files

Each segment consists of two physical files: a .index file and a .log file. The two files share the same base name, differing only by extension.

Kafka partition directory structure
Kafka partition directory structure

Messages are generated sequentially, each assigned an offset starting from 0, indicating its position within a partition. Each segment stores messages whose offsets fall within a specific range.

Segment files are named with the starting offset of their range, padded to 20 digits (e.g., 00000000000000000000.index and 00000000000000000000.log for offsets 0‑19).

The .index file is simple: each line contains a key,value pair where the key is the message offset and the value is the physical byte position of that message in the corresponding log file.

1,0
3,299
6,497
...

The .log file stores the actual message payload along with metadata such as offset, size, checksum, etc.

Message Retrieval Example

To read the message with offset=368:

Identify the segment containing the offset by listing all segment files in the partition directory and performing a binary search on their offset ranges.

Locate the corresponding .index file (e.g., 00000000000000000300.index) and read the line with key 368 to obtain the byte offset (e.g., 299).

Open the matching .log file (e.g., 00000000000000000300.log) and read from byte position 299 to retrieve the message content.

This process completes the message lookup.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaMessage Queuestorage architecture
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.