Building and Migrating to RocketMQ-based Message Middleware Platform at vivo
vivo’s Internet Middleware Team replaced its RabbitMQ service with a RocketMQ‑based platform, building an AMQP‑Proxy gateway and metadata layer to enable seamless, zero‑downtime migration while achieving over 100,000 TPS, billion‑message capacity, 50% resource savings, and advanced features such as transactions, ordered and delayed messaging, and tracing.
This article introduces how vivo's Internet Middleware Team built a next-generation message middleware platform based on RocketMQ and achieved smooth migration from RabbitMQ with zero business downtime.
Background: Since 2016, vivo had been providing high-availability message middleware services based on open-source RabbitMQ. With rapid business growth, RabbitMQ revealed several limitations: insufficient high-availability capabilities with split-brain risks, performance bottlenecks where single queue cannot quickly migrate across nodes supporting only tens of thousands of TPS, and missing support for transaction messages, ordered messages, and message tracing.
Project Goals: The team defined requirements for the new platform including high performance supporting horizontal scaling, high availability (>99.99% platform availability, >99.99% data reliability), and rich features including cluster/broadcast consumption, transaction/ordered/delay/dead-letter messages, and message tracing.
Technology Selection: After comparing RocketMQ and Pulsar, RocketMQ was chosen for its better support for transaction messages, message tracing, and consumption patterns critical for online business, despite Pulsar's superior architecture with compute-storage separation.
Migration Implementation: The team built an AMQP-Proxy message gateway to convert AMQP protocol to RocketMQ, implemented metadata mapping between RabbitMQ and RocketMQ semantics, and developed high-performance non-blocking message push mechanisms using semaphores and blocking queues. The final architecture includes AMQP-Proxy for protocol conversion, mq-meta for metadata management, and mq-controller for master-slave switching and cluster monitoring.
Migration Results: Business traffic support increased from tens of thousands TPS to over 100,000 TPS, capacity expanded from hundreds of millions to billions of messages, machine resource usage reduced by over 50%, and richer features including gradient delay redelivery for failed messages, broadcast consumption, message tracing, and consumption throttling were enabled.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.