Designing a High‑Availability, High‑Concurrency, Scalable RocketMQ Cluster Using Dledger
This article explains how to build a highly available, high‑throughput, and horizontally scalable RocketMQ cluster by deploying multiple NameServers and using the Dledger mode to achieve fault‑tolerance, load distribution, and massive message handling for demanding business scenarios.
Background
The author’s business line originally consisted of three independent services, which sufficed when the system was simple. As product iterations added more features, the team faced high concurrency, service decoupling, and distributed transaction challenges, prompting the adoption of RocketMQ for better message handling.
Because the internal business lines are deployed independently, there was an urgent need to build a self‑managed, highly available RocketMQ cluster with the following requirements: high availability, high concurrency, scalability, and support for massive messages.
NameServer Service
To ensure NameServer high availability, three machines are deployed; the cluster remains operational as long as at least one NameServer is alive, since each NameServer holds complete routing information and operates independently without communicating with others.
Broker Cluster Deployment Options
RocketMQ supports four main cluster architectures:
Multiple‑Master (no slaves)
Multiple‑Master with asynchronous slave replication
Multiple‑Master with synchronous double‑write replication
Dledger deployment (master‑slave group with automatic leader election)
Multiple‑Master Mode
All nodes are masters; there are no slaves.
Advantages: simple configuration, no impact on applications when a master restarts, and reliable storage (RAID10) prevents message loss. Disadvantages: messages on a downed machine cannot be consumed until it recovers, affecting real‑time delivery.
Multiple‑Master + Asynchronous Slave Replication
Each master has a slave; HA uses asynchronous replication, causing millisecond‑level delay.
Advantages: minimal message loss even if disks fail, transparent failover for consumers, performance similar to multiple‑master mode. Disadvantages: a master crash combined with disk failure may lose a small number of messages.
Multiple‑Master + Synchronous Double‑Write
Each master has a slave; HA uses synchronous writes, requiring both master and slave to succeed before acknowledging.
Advantages: no single point of failure, zero message delay on master failure, high availability of data and service. Disadvantages: ~10% lower performance compared to asynchronous mode and current versions do not auto‑promote a slave after master loss.
Dledger Mode
Older RocketMQ versions used master‑slave architecture, which required manual promotion of a slave after master failure. Dledger requires at least three brokers (one master, two slaves) forming a group; when the master fails, the remaining brokers elect a new master automatically, eliminating manual intervention.
Overall Architecture: High Availability, High Concurrency, Scalability, Massive Messaging
After evaluating the four options, the Dledger mode was chosen. The final logical deployment includes three NameServers and multiple Dledger groups, each with one master and two slaves.
High Availability
With three NameServers, the cluster tolerates any two failures. Each master broker has two slaves; if a master in a Dledger group fails, the remaining brokers elect a new master, ensuring continuous service.
High Concurrency
For a topic receiving 100,000 messages per second, adding more master brokers distributes the load (e.g., five masters handle 20,000 messages each).
Scalability
Increasing message volume or concurrency is achieved by adding more brokers, allowing linear scaling of the cluster.
Massive Messaging
Data is distributed across brokers; to store more data, simply add more master brokers.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.