How RocketMQ’s Storage Architecture Powers High Throughput and Low Latency

This article explains RocketMQ’s disk‑based storage design—including CommitLog, ConsumeQueue, and Index files—detailing sequential writes, memory‑mapped I/O, flexible flush strategies, and memory‑level read/write separation that together enable unlimited backlog, high throughput, and low latency messaging.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How RocketMQ’s Storage Architecture Powers High Throughput and Low Latency

RocketMQ, a disk‑based middleware, offers virtually unlimited backlog capacity along with high throughput and low latency, and its core strength lies in its elegant storage design.

Note: This article is excerpted from the newly released "RocketMQ Technical Insider" second edition, which first distills RocketMQ’s core mechanisms with illustrations before diving into source code, reducing reading difficulty and prompting deeper thought.

1. Storage Overview

RocketMQ’s storage files mainly include CommitLog files, ConsumeQueue files, and Index files.

All topics’ messages are stored in a single CommitLog file, ensuring sequential writes for high availability and throughput.

Because message consumption follows a topic‑based publish/subscribe model, scanning the CommitLog by topic would be inefficient; therefore RocketMQ introduces the ConsumeQueue file as a topic‑based index.

To support attribute‑based message retrieval, RocketMQ builds a hash index stored in the Index file.

After sequentially writing to the CommitLog, the ConsumeQueue and Index files are constructed asynchronously. The data flow is illustrated below:

2. Storage File Organization

RocketMQ pursues extreme sequential disk writes. All topics’ messages are appended to a single CommitLog file in arrival order; once written, messages cannot be modified. The layout of the CommitLog file is shown below:

In file‑based programming, each message is identified by a physical offset—the position where the message starts in the file.

The CommitLog file name encodes the offset of its first message, e.g., the first file is 0000000000000000000, the second is 00000000001073741824, and so on.

Given a message’s physical offset (e.g., 73741824), a binary search locates the corresponding file, and the difference between the offset and the file name yields the absolute address within that file.

Because ConsumeQueue is a topic‑based index of the CommitLog, its structure is shown below:

Each ConsumeQueue entry has a fixed length (8‑byte physical offset, 4‑byte message size, 8‑byte tag hashcode). Storing the tag’s hashcode ensures constant entry size, allowing array‑like random access and greatly improving read performance.

Consumers locate a message by calculating logicOffset * 20 to obtain the entry’s start offset, then reading the subsequent 20 bytes.

While RocketMQ supports attribute‑based retrieval via the Index file, ConsumeQueue alone cannot handle attribute queries.

The Index file implements a file‑based hash index. Its structure consists of a 40‑byte header, 5 million hash slots (4 bytes each), and 20 million index entries (20 bytes each) containing the key’s hashcode, message physical offset, timestamp, and a pointer to the previous entry for collision chaining.

This creates a mapping from hashcode to physical offset, enabling fast location of the CommitLog file.

3. Sequential Write

Disk‑sequential writes are a key design principle for improving write performance.

Similar to MySQL InnoDB’s redo log, RocketMQ writes messages to the page cache first and then flushes them to disk in order, avoiding random writes that would degrade performance.

4. Memory‑Mapped Mechanism

To further boost I/O, RocketMQ maps disk files into memory using Java’s FileChannel.map, allowing memory‑style operations on disk data.

On Linux, the mapped files reside in the OS page cache. The system evicts pages using algorithms like LRU when memory pressure arises.

If a broker crashes, data in the page cache is persisted to disk by the OS; however, data only in off‑heap memory may be lost on power failure.

5. Flexible Flush Strategies

With sequential writes and memory mapping, RocketMQ must decide when to acknowledge a client: after writing to page cache or after persisting to disk.

This trade‑off is handled by two strategies: synchronous flush and asynchronous flush.

5.1 Synchronous Flush

Synchronous flush works as a group commit: after a thread writes to memory, it submits a flush request and blocks. The flush thread batches pending messages and flushes them together, then wakes the waiting threads.

5.2 Asynchronous Flush

Asynchronous flush writes to page cache and immediately returns success to the client, while a background thread periodically forces the file channel to flush (default interval 500 ms).

6. Memory‑Level Read/Write Separation

To reduce page‑cache pressure, RocketMQ enables transientStorePool , writing messages first to off‑heap memory and then asynchronously moving them to page cache and finally to disk.

Consumers read from page cache, achieving read/write separation at the memory level.

The advantage is batch‑writing to page cache; the downside is potential loss of off‑heap data if the broker crashes abruptly.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RocketMQstorage architectureMemory Mapped FilesSequential Write
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.