Operations 14 min read

Migrating to RocketMQ: Building a High‑Performance Cloud‑Native Messaging Platform

Facing scaling, high‑availability, and feature limitations of RabbitMQ, the vivo middleware team evaluated RocketMQ and Pulsar, chose RocketMQ, and detailed a seamless migration strategy—including a message gateway, metadata mapping, high‑performance push, consumption controls, and operational benefits that boosted TPS and reduced resource usage.

ITPUB
ITPUB
ITPUB
Migrating to RocketMQ: Building a High‑Performance Cloud‑Native Messaging Platform

Background

Since 2016 the Vivo Internet middleware team operated a high‑availability messaging platform based on RabbitMQ. Rapid growth of business traffic caused three critical limitations: (1) insufficient HA – split‑brain scenarios required manual intervention and could lead to data loss; (2) performance bottlenecks – each queue was bound to a single node, limiting TPS to a few × 10⁴ and causing severe degradation when message back‑log reached millions; (3) missing features – no native transactional, ordered, delayed, dead‑letter, or tracing capabilities.

Project Goals

Business requirements : support extremely high TPS, enable horizontal scaling, and avoid the messaging layer becoming a bottleneck.

Platform requirements : achieve >99.99% availability, >99.99999999% data reliability, and provide clustered/broadcast consumption, transactional, ordered, delayed, dead‑letter and trace messages.

Component Selection

High‑Availability Comparison

Pulsar uses a compute‑storage separation architecture with BookKeeper for fast failover. RocketMQ follows a master‑slave replication model that requires custom development for failover.

Scaling and Fault Recovery

Pulsar : brokers and ZooKeeper scale independently, brokers are stateless, automatic load‑balancing, and fault recovery occurs within seconds.

RocketMQ : broker scaling needs manual load‑balancing; failover relies on master‑slave switch with a 30‑60 s recovery window.

Performance

Pulsar : can host millions of topics (limited by ZooKeeper) and internal tests show several hundred thousand TPS for 1 KB messages.

RocketMQ : logical support for millions of topics, but practical limit is ~50 k topics per cluster; benchmark shows >100 k TPS for 1 KB messages.

Feature Comparison

Pulsar provides extensive expiration policies and message deduplication. RocketMQ offers stronger transactional support, message tracing, and diverse consumption modes, which better match online business needs.

Smooth Migration Architecture

A dedicated AMQP‑proxy gateway converts RabbitMQ’s AMQP protocol to RocketMQ. A separate metadata service stores the mapping between RabbitMQ semantics (exchange, queue, binding) and RocketMQ constructs (topic, consumer group).

Key Migration Tasks

Deploy the message gateway – either as a standalone service or embedded component.

Define and maintain metadata mappings in the mq-meta service.

Implement a high‑performance, non‑blocking push model: a semaphore‑driven, on‑demand thread pool replaces the per‑queue thread model, allowing thousands of queues to share a limited number of threads.

Add consumption start/stop and throttling logic in the gateway to support global or partial pause and rate‑limiting.

The platform consists of:

AMQP‑proxy gateway for protocol conversion. mq-meta service for metadata management. mq-controller for master‑slave switching, monitoring and load balancing.

Platform architecture diagram
Platform architecture diagram

Progress and Benefits

Performance Gains

After migration the platform sustains 80‑100 k TPS for 1 KB messages, compared with only a few thousand TPS on RabbitMQ. Resource consumption dropped by >50% and operational complexity was reduced.

RabbitMQ benchmark performance
RabbitMQ benchmark performance
Post‑migration performance
Post‑migration performance

Feature Enhancements

Unified message expiration (default 3‑7 days).

Gradual delayed redelivery for exception messages, with eventual dead‑letter handling.

Native broadcast consumption where each node receives a single copy.

Full‑environment message tracing.

Consumption offset reset to a previous position.

Operational Improvements

Supported TPS increased from tens of thousands to one hundred thousand, business capacity grew from hundreds of millions to billions of messages, machine usage fell by >50%, and maintenance was simplified through centralized monitoring, automated failover and load‑balancing.

Future Outlook

Extend the message gateway with advanced governance (e.g., traffic shaping, quota enforcement).

Expose the messaging engine as a gRPC‑based service to decouple business code from the underlying middleware.

Evaluate RocketMQ 5.0’s compute‑storage separation architecture for the next‑generation upgrade.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

migrationRabbitMQRocketMQMessaging
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.