Design and Smooth Migration of a High‑Availability Message Middleware Platform from RabbitMQ to RocketMQ
This article details the challenges of scaling RabbitMQ, the evaluation of RocketMQ versus Pulsar, the architectural design of a new high‑availability message middleware platform, and the step‑by‑step smooth migration strategy that enables higher throughput, richer features, and lower operational costs.
Vivo's Internet Middleware team built a high‑availability RabbitMQ‑based messaging platform in 2016, but rapid business growth exposed limitations in high availability, performance, and feature set.
Key problems identified include split‑brain risks without automatic recovery, bottlenecks when a single node hosts a high‑traffic queue, inability to quickly rebalance queues, and lack of transactional, ordered, and delayed messaging support.
To meet new business and platform requirements—extremely high TPS, >99.99% availability, >99.99999999% data reliability, and richer features such as transaction, ordering, delay, dead‑letter, and tracing—the team evaluated RocketMQ and Pulsar.
Comparative analysis showed Pulsar’s compute‑storage separation and BookKeeper‑based HA, while RocketMQ offered a master‑slave model with simpler integration for existing RabbitMQ semantics. Performance tests indicated RocketMQ could sustain >100k TPS for 1KB messages, sufficient for the target workload.
Consequently, RocketMQ was selected as the foundation for the next‑generation platform. The migration plan includes deploying an AMQP‑proxy gateway to translate RabbitMQ protocols to RocketMQ, defining metadata mapping, and implementing high‑performance, non‑interfering push consumption using semaphore‑controlled threads.
Additional platform capabilities added are consumption pause/resume, global rate limiting, unified message expiration, broadcast consumption, and message tracing, while preserving compatibility with existing RabbitMQ clients.
After migration, business throughput increased from tens of thousands TPS to over one hundred thousand TPS, resource usage dropped by more than 50%, and operational complexity was greatly reduced.
Future work will focus on extending gateway‑based features, abstracting the underlying queue engine via gRPC services, and exploring RocketMQ 5.0’s compute‑storage separation architecture.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
