How RocketMQ Achieves Million‑TPS with Sequential Writes and Multi‑Level Indexes
RocketMQ tackles the high‑performance, high‑reliability challenges of distributed messaging by combining sequential disk writes, memory caching, and multi‑level indexing, detailing its storage logic, core structures, zero‑copy techniques, replication modes, static topic scaling, and practical tuning guidelines for optimal throughput.
1 Background
In distributed systems, the core challenge of message queues is achieving high performance while guaranteeing high reliability. Traditional disk random write speed is about 100KB/s, while sequential write can reach 600MB/s, which led to RocketMQ’s unique storage design. As an Apache top‑level project, RocketMQ solves massive‑message persistence by combining sequential disk writes, memory cache, and multi‑level indexes.
2 Message Storage Logic
RocketMQ uses storage duration as the basis for message retention. Each node promises a storage time:
Messages are retained for the configured time regardless of consumption.
Messages exceeding the storage time are cleaned up.
The storage mechanism defines key points:
Granularity: managed per storage node.
Judgment basis: storage time.
Storage is independent of consumption state.
Message storage in the queue is illustrated (image from RocketMQ official site).
3 Implementation Principles
3.1 Core Storage Structures
1. CommitLog (data file)
All topics’ messages are appended; each file defaults to 1 GB, named by the first message offset (e.g., 0000000000000000000).
Message format: total length (4 bytes) + body, supporting fast offset lookup.
Flush policy:
Asynchronous flush (default): write to PageCache and return ACK; background thread flushes, increasing throughput by ~30%.
Synchronous flush : wait for data to be persisted before ACK, ensuring zero data loss.
2. ConsumeQueue (index file)
Indexes stored per Topic/Queue; each file holds 300 k records.
Each record is 20 bytes: CommitLog offset (8 bytes) + message length (4 bytes) + Tag hash (8 bytes).
Consumers locate messages quickly via the index, avoiding full file scans.
3. IndexFile (query acceleration)
Hash index based on message key, enabling millisecond‑level queries.
Each file stores 20 million keys, mapping Key→CommitLog offset for fast retrieval.
3.2 Key Technical Implementations
1. Zero‑Copy
Uses MappedByteBuffer memory‑mapped files to reduce user‑kernel data copies.
Combines Linux sendfile syscall so network transmission sends directly from PageCache, cutting CPU usage by ~50%.
2. Master‑Slave Replication
Synchronous replication : Master waits for slave to persist before ACK, ensuring strong consistency.
Asynchronous replication : Master returns ACK after write; slave syncs later, doubling throughput.
3. Static Topic Expansion
RocketMQ 5.0 introduces logical queues; during expansion, old physical queues become read‑only, new queues take writes.
Consumers switch transparently via logical queues, achieving seconds‑level scaling with zero data migration.
4 Best Practices
4.1 Flush Policy Selection
Financial transactions – synchronous flush ( flushDiskType=SYNC).
Log analysis – asynchronous flush ( flushDiskType=ASYNC).
Mixed load – dynamic switching based on business tags.
4.2 Index Optimization Tips
High‑frequency query keys: add business ID as index key in message header.
Cold data archiving: use TimerLog for automatic expiration, e.g.:
<messageStoreConfig>
<deleteWhen>04:00</deleteWhen>
<diskMaxUsedSpaceRatio>85</diskMaxUsedSpaceRatio>
</messageStoreConfig>4.3 Consumer Optimization
Parallel consumption – use PushConsumer with message‑level load balancing.
Ordered consumption – set orderly=true and specify MessageGroup.
Flow control – adjust pullThresholdForQueue to limit per‑queue fetch volume.
4.4 Monitoring & Alert Configuration
CommitLog accumulation > 100 GB → expand or clean historical data.
Consumer delay > 5 min → check consumer health.
Flush latency > 3 s → inspect disk I/O performance.
5 Summary
RocketMQ’s combination of sequential writes, multi‑level indexes, and flexible flushing delivers million‑TPS throughput while maintaining data reliability. Its static topic expansion and message‑granularity load balancing serve enterprise scenarios such as finance and IoT. In practice, choose flush policies, index settings, and consumption modes that match business characteristics to achieve optimal performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture & Thinking
🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
