Backend Development 28 min read

How RocketMQ Uses Queues, Page Cache, and mmap to Achieve High Performance and Scalability

This article explains how RocketMQ tackles registration latency, synchronous coupling, and traffic spikes by introducing an intermediate queue, designing a persistent high‑availability broker, leveraging Linux page cache and memory‑mapped files, and employing topics, tags, and sharding to enable efficient asynchronous processing and scalable consumption.

macrozheng
macrozheng
macrozheng
How RocketMQ Uses Queues, Page Cache, and mmap to Achieve High Performance and Scalability

The Trouble of Happiness

Zhang Dapeng faces both joy and worry as business volume surges; registration now requires multiple service calls (SMS, push, coupons), turning a simple step into a 200 ms operation and causing scalability headaches.

The CTO identifies three core problems:

Sync

,

Coupling

, and

Traffic Surge Risk

.

Sync

: The registration flow waits for other modules, causing high latency.

Coupling

: Registration code tightly integrates other modules, so any failure propagates to the user.

Traffic Surge Risk

: Promotional events can overload the registration process, leading to system collapse.

Bill suggests adding a middle‑layer queue to decouple services, turning the process into a classic producer‑consumer model.

With the queue, registration becomes asynchronous, reduces total latency from 200 ms to about 55 ms, and achieves peak‑shaving, improving throughput nearly fourfold.

Bill then questions which queue implementation to use, noting that a simple JDK

Queue

suffers from producer‑consumer coupling, message loss on crash, and single‑consumer limitations.

The queue resides in the producer’s memory, tightly coupling producer and consumer.

In‑memory queues lose messages if the machine crashes.

Only one consumer can read a message; scaling requires duplicate queues, which is inefficient.

Broker

To solve these issues, a dedicated broker component is introduced, handling persistence, high availability, and high performance.

Broker requirements:

Message Persistence : Store messages on disk (e.g., files) to survive crashes.

High Availability : Ensure service continuity if a broker fails.

High Performance : Achieve 100 k TPS by fast producer writes, fast disk persistence, and fast consumer reads.

Page Cache

Linux loads file blocks into page cache (4 KB pages) in kernel space, allowing the CPU to read/write data directly from memory.

Read file : CPU checks page cache first; a miss triggers a page‑fault and loads the block.

Write file : CPU writes to page cache, which later flushes to disk.

mmap

Memory‑mapped files map disk files into a process’s virtual address space, eliminating the copy between kernel and user space.

Note : Page cache resides in kernel space; mmap lets processes access it directly.

The file read flow involves loading the file into page cache, copying it to user space, and then mapping it into the process’s virtual memory.

File data is loaded into kernel page cache.

CPU copies it to a user‑space buffer.

The process accesses the data via its virtual address.

ConsumeQueue Improvements – Data Sharding

In broadcast mode every consumer reads all messages; in cluster mode consumers share the load by consuming only a subset of messages.

Multiple consumeQueues are created, and producers distribute messages across them, enabling parallel consumption.

Topic

Messages of the same business type are grouped into a Topic; producers specify the Topic, and the broker routes messages to the corresponding consumeQueue.

Tag

Within a Topic, messages can be further classified by Tag (e.g., order created, order closed). The broker stores the tag’s hashcode in the consumeQueue for fast integer comparison.

Summary

RocketMQ’s design achieves three goals: persistent storage via sequential commitlog writes, high performance through page cache and mmap to avoid disk I/O, and high availability with master‑slave brokers and a nameserver for service discovery. Data sharding, topics, and tags enable scalable, efficient consumption, while the nameserver decouples producers/consumers from broker addresses, simplifying configuration and fault tolerance.

distributed systemsMessage Queuerocketmqasynchronous processinghigh performancebroker design
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.