Design and Migration of a High‑Performance Message Middleware Platform from RabbitMQ to RocketMQ

To address RabbitMQ’s scalability, reliability, and feature limitations, Vivo’s middleware team evaluated RocketMQ and Pulsar, selected RocketMQ, and built a next‑generation message middleware platform with an AMQP‑proxy gateway, metadata services, and high‑availability mechanisms, enabling seamless, high‑throughput migration and richer messaging capabilities.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Design and Migration of a High‑Performance Message Middleware Platform from RabbitMQ to RocketMQ

Vivo’s Internet Middleware team has been providing a high‑availability RabbitMQ‑based message middleware platform since 2016, but rapid business growth caused increasing message volumes that exposed limitations in high‑availability, performance, and functional features.

Key issues identified include split‑brain risk and lack of automatic recovery in RabbitMQ’s HA design, performance bottlenecks where a single node handles a queue’s traffic (limiting TPS to tens of thousands), and missing capabilities such as transactional messages, ordered delivery, and efficient message tracing.

The next‑generation platform’s objectives were defined as supporting extreme TPS, horizontal scalability, >99.99% platform availability, >99.99999999% data reliability, and rich features (clustered and broadcast consumption, transaction, ordering, delayed and dead‑letter messages, and full‑traceability), while also being operable, observable, extensible, and cloud‑native.

A comparative study of RocketMQ and Pulsar examined HA architecture, load‑balancing, scaling, fault recovery, and performance. RocketMQ was chosen for its superior support of transaction messages, message tracing, and consumption models, despite Pulsar’s stronger HA and load‑balancing design.

The migration strategy introduced an AMQP‑proxy gateway that converts RabbitMQ’s AMQP protocol to RocketMQ, a metadata service to map and store RabbitMQ semantics, and a controller for master‑slave switching. High‑performance, non‑interfering push was achieved using semaphores, a shared thread pool, and per‑queue flow control, while consumption pause and rate‑limiting were added.

The final architecture consists of the AMQP‑proxy, mq‑meta service, mq‑controller, and a RocketMQ cluster, all deployed in a cloud‑native manner.

Performance tests showed throughput rising from ~10‑20 kTPS with RabbitMQ to >100 kTPS after migration, with resource consumption reduced by more than 50%. New features include unified message expiration, gradient delayed retry, broadcast consumption, full‑environment tracing, and consumption reset, while operational complexity and cost were significantly lowered.

Future work will enrich the gateway with additional governance capabilities, expose the messaging engine via gRPC to decouple applications from specific middleware implementations, and explore RocketMQ 5.0’s compute‑storage separation for the next architectural upgrade.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

middlewareRabbitMQRocketMQMessaginghigh-availability
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.