Cloud Native 12 min read

How RocketMQ’s New Tiered Storage Extends Message Retention Cost‑Effectively

RocketMQ 5.1.0 introduces a tiered storage module that offloads messages from local disks to cheaper storage, enabling longer retention without impacting hot data performance, and the article explains its design, implementation details, configuration steps, and current challenges.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How RocketMQ’s New Tiered Storage Extends Message Retention Cost‑Effectively

Design Overview

RocketMQ tiered storage offloads messages to cheaper storage media without affecting hot‑data read/write. It solves two problems: (1) separating hot and cold data to avoid I/O contention when cold data is read, and (2) extending message retention time at lower cost.

Key Design Differences

Unlike Kafka and Pulsar, RocketMQ uploads messages in near‑real‑time instead of waiting for a full CommitLog to fill. This reduces performance spikes and is friendlier to small‑spec instances because the global CommitLog is split into per‑topic segments and indexed on the fly.

Quick Start

Set org.apache.rocketmq.tieredstore.TieredMessageStore as messageStorePlugIn in the broker configuration.

Configure the storage backend, e.g. set tieredBackendServiceProvider to

org.apache.rocketmq.tieredstore.provider.posix.PosixFileSegment

and define tieredStoreFilepath for the target directory.

Optional: change tieredMetadataServiceProvider to use a different metadata store (default is a JSON file).

Technical Architecture

Access Layer : TieredMessageStore, TieredDispatcher, TieredMessageFetcher – implements async read/write interfaces, uses dedicated thread pools and a pre‑read cache for performance.

Container Layer : TieredCommitLog, TieredConsumeQueue, TieredIndexFile, TieredFileQueue – mirrors the DefaultMessageStore structures but stores CommitLog per queue.

Driver Layer : TieredFileSegment – maps logical files to physical files via TieredStoreProvider implementations (Posix, S3, OSS, MinIO, etc.).

Message Upload Process

TieredDispatcher

registers as the CommitLog dispatcher. When a message is sent, it writes a reference to an upload buffer and returns success immediately, so local ConsumeQueue construction is never blocked. The buffer stores only message references; the body is not loaded into memory. Offsets are embedded into the original message during upload.

Upload Progress Control

Two offsets per queue manage progress:

dispatch offset – messages written to the buffer but not yet uploaded.

commit offset – messages already uploaded.

These offsets are analogous to consumer fetch and commit positions, defining the window of fetched‑but‑unconsumed messages.

Message Retrieval

TieredMessageStore

decides whether to read from tiered storage based on the logical queue offset and the configured tieredStorageLevel. Four strategies are supported:

DISABLE – never read from tiered storage.

NOT_IN_DISK – read messages not present in the local CommitLog.

NOT_IN_MEM – read cold data not in page cache.

FORCE – force all reads from tiered storage (test only).

/**
 * Asynchronous get message
 * @see #getMessage(String, String, int, long, int, MessageFilter) getMessage
 */
CompletableFuture<GetMessageResult> getMessageAsync(final String group, final String topic, final int queueId,
    final long offset, final int maxMsgNums, final MessageFilter messageFilter);

When a read request reaches tiered storage, TieredMessageFetcher validates parameters, translates the logical offset to a physical file position via TieredConsumeQueue / TieredCommitLog, and reads through TieredFileSegment:

public CompletableFuture<GetMessageResult> getMessageAsync(String group, String topic, int queueId,
        long queueOffset, int maxMsgNums, final MessageFilter messageFilter);

Pre‑Read Cache

The fetcher maintains a pre‑read cache that stores a batch of messages for subsequent requests. Cache size follows an additive‑increase, multiplicative‑decrease algorithm inspired by TCP Tahoe.

protected final Cache<MessageCacheKey, SelectMappedBufferResultWrapper> readAheadCache;

Cache eviction occurs when all consumer groups for a topic have accessed the cache or when the cache expiration time is reached. Divergent consumption speeds can delay eviction, causing stale messages to accumulate.

Failure Recovery

Metadata for each topic, queue, and file segment stores both dispatch and commit offsets. After a broker restart, the system restores these offsets from metadata and resumes uploads from the last commit offset, guaranteeing no data loss.

Development Plans & Cloud‑Native Vision

The design leverages cheap object storage to extend message lifespan and improve reliability in multi‑replica architectures, aligning with serverless and cloud‑native trends.

Future work includes:

Server‑side tag filtering to reduce network overhead.

Improved cache eviction for heterogeneous consumer groups.

More robust metadata synchronization across nodes, especially during slave promotion.

Open Challenges

Reliable metadata synchronization among brokers and handling missing metadata during slave promotion.

Preventing uploads beyond the confirm offset to avoid message rollback.

Fast startup of tiered storage on slave promotion when only the master has write permission.

References

README: https://github.com/apache/rocketmq/blob/develop/tieredstore/README.md

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendCloud NativeMessage QueueRocketMQtiered storage
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.