Big Data 9 min read

Kafka File Storage Mechanism and Architecture

Kafka stores each topic as partitions that are divided into sequential segment files containing paired .log data and .index files, using global offsets and sparse memory‑mapped indexes to enable fast offset‑based lookups, efficient deletions, and minimal disk I/O in real‑world deployments.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Kafka File Storage Mechanism and Architecture

Kafka, originally developed by LinkedIn and later donated to the Apache Foundation, is a distributed log system that can also serve as a message queue. It is widely used for web/nginx logs, access logs, and messaging services.

1. Introduction The design of a commercial message queue’s file storage mechanism is a key indicator of its technical level. This article analyzes Kafka’s file storage from the perspective of its physical structure and evaluates its efficiency in real-world deployments.

2. Kafka File Storage Mechanism

Key terminology:

Broker: a Kafka node; multiple brokers form a cluster.

Topic: a logical stream of messages (e.g., page view logs).

Partition: an ordered queue that physically splits a topic.

Segment: a sub‑file within a partition.

The analysis proceeds in four steps: partition distribution, file storage method, segment file structure, and offset‑based message lookup.

2.1 Partition Distribution Assuming a single‑broker cluster with log.dirs=xxx/message-folder and two topics (report_push, launch_info) each having four partitions, the directory layout is:

|--report_push-0
|--report_push-1
|--report_push-2
|--report_push-3
|--launch_info-0
|--launch_info-1
|--launch_info-2
|--launch_info-3

Each partition is a separate directory named topicName‑index . Multi‑broker distribution follows the same naming principle.

2.2 Storage Method A partition consists of multiple equal‑size segment files. Segments may contain different numbers of messages, allowing old segments to be deleted quickly. Only sequential read/write is required; segment lifecycle is controlled by server configuration.

2.3 Segment File Structure Each segment comprises an index file (suffix .index) and a data file (suffix .log). Naming follows a global offset scheme: the first segment starts at 0, subsequent segments are named after the maximum offset of the previous segment, padded to 19 digits.

Message layout inside a data file (simplified):

8‑byte offset

4‑byte message size

4‑byte CRC32

1‑byte magic (protocol version)

1‑byte attributes (compression/type)

4‑byte key length (‑1 if no key)

K‑byte key (optional)

value bytes payload

2.4 Offset Lookup To retrieve a message with offset 368776:

Binary‑search the segment files by their start offset to locate the appropriate .index / .log pair.

Read the index entry to obtain the physical position in the data file, then scan sequentially in the .log until the desired offset is reached.

Kafka uses sparse indexing, which reduces index size and enables memory‑mapped access, at the cost of a slightly longer lookup time.

3. Practical Performance Experiment setup: 2‑node Kafka cluster, 4‑CPU, 8 GB RAM, 1 Gbps NIC, JVM heap 4 GB. Observations show minimal disk reads during normal operation because writes are batched and flushed asynchronously, while reads are served from the page cache whenever possible.

4. Summary

Partitions are split into small segment files, facilitating easy deletion of consumed data.

Index information enables fast message location and response size estimation.

Memory‑mapped sparse indexes avoid frequent disk I/O.

Sparse indexing dramatically reduces index file footprint.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaMessage QueueSegmentPartitionfile storage
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.