Understanding Kafka's Segment Storage and Index Design
This article explains how Kafka partitions data into segments, stores each segment as paired index and log files, and uses sparse indexing to enable efficient queries, illustrating the process with examples and diagrams of segment layout and offset lookup.
This article introduces Kafka's underlying data storage format, its efficient index design, and the actual query process.
1. Segment
Kafka divides each partition into multiple segments, which are the smallest storage units. When a broker writes data to a partition and a segment reaches its size limit (default 1 GB or one week), the current segment is closed and a new one is opened. Segments that are still open are called active segments and are never deleted. This design splits a large partition file into many small files, making searches faster and allowing whole‑file deletion for data expiration.
2. Storage and Query
Each segment consists of two files that appear as a pair: an .index file and a .log file. The index file stores offsets, while the log file stores the actual data. The index file name is the starting offset of the segment.
For example, to query offset = 368775, the index file 0000000000000368769 is consulted; the offset is found at the third position (368775 = 368769 + 6), which maps to a physical position 1407 in the log file. If the index does not contain the offset, the previous index’s file offset is used for sequential scanning.
3. Index Design
Kafka uses a sparse index, storing only a subset of offsets to reduce memory usage. Because the index is not exhaustive, some queries may require additional scanning, potentially increasing lookup time.
Copyright statement: This article is compiled by the Big Data Technology and Architecture team with exclusive authorization from the original author. Unauthorized reproduction will be pursued for infringement.
Editor: 冷眼丶
WeChat public account: import_bigdata
Enjoyed the article? Please like, bookmark, and share it.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
