Cloud Native 16 min read

How RocketMQ Implements Random Indexing for Cloud‑Native Storage

This article explains RocketMQ's random indexing mechanism, detailing its on‑disk three‑segment hash table structure, the compact format conversion process, multi‑threaded write and query workflows, layered system design, crash‑recovery strategy, and comparisons with RocksDB and InnoDB storage engines.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How RocketMQ Implements Random Indexing for Cloud‑Native Storage

Characteristics of Random Indexing in Message Systems

RocketMQ stores message indices by message ID or business keys (e.g., order number). Traditional index storage in databases or local files cannot scale to massive write workloads because of disk capacity limits.

Disk Index Structure

Each index file consists of three segments arranged as a head‑insertion hash table:

IndexHeader : metadata such as magic code, start/end timestamps, number of used slots (hashSlotCount) and total index entries (indexCount).

Slots : a fixed‑size array; each slot holds the file offset of the head node of a singly‑linked list for entries that hash to the same slot.

IndexItems : records containing topicId, queueId, offset, size and other fields needed to locate the original message in the CommitLog.

Index file structure
Index file structure

Compact Format Conversion

The index module is write‑heavy and read‑light, so a small amount of read amplification is acceptable. Let t1 be the write cost, t2 the average time before a query, t_compact the time to compact, t_before the query latency before compaction and t_after after compaction. Because t_compact << t2, compaction can run asynchronously without affecting query latency, and t_after < t_before:

t1 + t2 + t_before > t1 + t2 + t_after

Time comparison
Time comparison

Lifecycle of a Single Index File

An index file moves through three states:

unsealed : actively written.

compacted : write‑stop, ready for upload.

uploaded : stored in object storage.

When the file reaches its capacity it is marked compacted , uploaded, and eventually expired.

Single index file lifecycle
Single index file lifecycle

Storage Model for Multiple Index Files

Multiple index files are managed as a set, each with an independent lifecycle. New files are created when the current file is full; each file can be in any of the three states described above.

Multiple index files model
Multiple index files model

System Layered Design

Index Service Layer : provides indexing APIs, manages file lifecycles, and coordinates write, query and background tasks.

Index File Parsing Layer : parses individual index files and exposes KV‑style queries and format conversion.

Data Storage Layer : handles binary I/O to local disks, object storage, or databases.

High‑Availability Crash Recovery

On restart the system scans directories named after file states (e.g., writing, compact, upload), loads each index file into memory, and rebuilds an in‑memory skip‑list that tracks file locations and statuses.

Comparison with Other Storage Engines

RocksDB uses Log‑Structured Merge (LSM) trees with asynchronous compaction, which improves read performance but incurs significant write amplification.

MySQL InnoDB relies on B+‑tree structures and redo logs; it offers high write throughput but its index scalability is limited for very large tables.

RocketMQ’s append‑only, time‑ordered design enables simple hot‑cold separation and asynchronous format conversion, reducing overall latency and avoiding the heavy write amplification of LSM‑tree compaction.

Remaining Issues and Future Improvements

The current design lacks an efficient global maxCount limit. Queries may need to scan all index files before determining that the required number of results has been found, causing unnecessary I/O. Introducing a thread‑safe global counter would allow early termination once maxCount is reached.

Extending IndexItem via inheritance would enable the service to support additional systems without rewriting the core indexing logic.

Reference Documents

1. Zhang, H., Wu, X., & Freedman, M. J. (2008). PacificA: Replication in Log‑Based Distributed Storage Systems.

https://www.microsoft.com/en-us/research/wp-content/uploads/2008/02/tr-2008-25.pdf

2. RocksDB Compactions. https://github.com/facebook/rocksdb/wiki/Compaction 3. Inside InnoDB: The InnoDB Storage Engine.

https://dev.mysql.com/doc/refman/8.0/en/innodb-internals.html
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemscompactionRocketMQhash tableMessage Indexing
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.