Operations 8 min read

Designing a High‑Availability, High‑Concurrency, Scalable RocketMQ Cluster Using Dledger

This article explains how to build a highly available, high‑throughput, and horizontally scalable RocketMQ cluster by deploying multiple NameServers and using the Dledger mode to achieve fault‑tolerance, load distribution, and massive message handling for demanding business scenarios.

Architecture Digest
Architecture Digest
Architecture Digest
Designing a High‑Availability, High‑Concurrency, Scalable RocketMQ Cluster Using Dledger

Background

The author’s business line originally consisted of three independent services, which sufficed when the system was simple. As product iterations added more features, the team faced high concurrency, service decoupling, and distributed transaction challenges, prompting the adoption of RocketMQ for better message handling.

Because the internal business lines are deployed independently, there was an urgent need to build a self‑managed, highly available RocketMQ cluster with the following requirements: high availability, high concurrency, scalability, and support for massive messages.

NameServer Service

To ensure NameServer high availability, three machines are deployed; the cluster remains operational as long as at least one NameServer is alive, since each NameServer holds complete routing information and operates independently without communicating with others.

Broker Cluster Deployment Options

RocketMQ supports four main cluster architectures:

Multiple‑Master (no slaves)

Multiple‑Master with asynchronous slave replication

Multiple‑Master with synchronous double‑write replication

Dledger deployment (master‑slave group with automatic leader election)

Multiple‑Master Mode

All nodes are masters; there are no slaves.

Advantages: simple configuration, no impact on applications when a master restarts, and reliable storage (RAID10) prevents message loss. Disadvantages: messages on a downed machine cannot be consumed until it recovers, affecting real‑time delivery.

Multiple‑Master + Asynchronous Slave Replication

Each master has a slave; HA uses asynchronous replication, causing millisecond‑level delay.

Advantages: minimal message loss even if disks fail, transparent failover for consumers, performance similar to multiple‑master mode. Disadvantages: a master crash combined with disk failure may lose a small number of messages.

Multiple‑Master + Synchronous Double‑Write

Each master has a slave; HA uses synchronous writes, requiring both master and slave to succeed before acknowledging.

Advantages: no single point of failure, zero message delay on master failure, high availability of data and service. Disadvantages: ~10% lower performance compared to asynchronous mode and current versions do not auto‑promote a slave after master loss.

Dledger Mode

Older RocketMQ versions used master‑slave architecture, which required manual promotion of a slave after master failure. Dledger requires at least three brokers (one master, two slaves) forming a group; when the master fails, the remaining brokers elect a new master automatically, eliminating manual intervention.

Overall Architecture: High Availability, High Concurrency, Scalability, Massive Messaging

After evaluating the four options, the Dledger mode was chosen. The final logical deployment includes three NameServers and multiple Dledger groups, each with one master and two slaves.

High Availability

With three NameServers, the cluster tolerates any two failures. Each master broker has two slaves; if a master in a Dledger group fails, the remaining brokers elect a new master, ensuring continuous service.

High Concurrency

For a topic receiving 100,000 messages per second, adding more master brokers distributes the load (e.g., five masters handle 20,000 messages each).

Scalability

Increasing message volume or concurrency is achieved by adding more brokers, allowing linear scaling of the cluster.

Massive Messaging

Data is distributed across brokers; to store more data, simply add more master brokers.

Distributed Systemsscalabilityhigh-availabilityMessage QueuerocketmqDledger
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.