Unlocking High-Performance MQ: Lessons from Alibaba, Tencent, and Ctrip

To support Meituan’s rapid growth, this article examines the design and evolution of several industry-leading message-queue solutions—including Alibaba’s Notify and RocketMQ, Tencent’s Tube and Hippo, and Ctrip’s Herms—highlighting their reliability, scalability, and decoupling features, and extracting key insights for building robust MQ systems.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Unlocking High-Performance MQ: Lessons from Alibaba, Tencent, and Ctrip

Background

Meituan is currently designing and continuously iterating its message middleware solution. To avoid missteps, the team wants to learn from industry leaders, absorb best practices, and accelerate Meituan MQ evolution to support rapid business expansion.

Goals: reliability (guarantee no message loss), asynchronous communication, and decoupling (no need for simultaneous online presence or knowledge of counterpart). Data storage hierarchy: in‑memory (loss on power failure) → persistent disk (disk failure) → redundant backup (consistency issues).

Industry MQ Design Schemes

1. Alibaba Notify Architecture

Features

Notify nodes do not communicate with each other.

Supports horizontal scaling.

Clients obtain Notify address lists from a Config Server.

Clients automatically detect addition or removal of Notify nodes.

Publishers, consumers, and Notify servers all support clustering.

Messages are stored in different locations (e.g., File, Oracle, MySQL) based on security level, then kept in memory for performance.

Push model.

2. Alibaba RocketMQ Architecture

Two‑master, two‑slave deployment mode:

Features

Name Server is stateless; nodes do not synchronize information.

Broker establishes long connections with all Name Server nodes and periodically registers topic information.

Producer connects to one Name Server node to fetch routing information.

Consumer connects to one Name Server node to obtain routing information.

Pull model.

3. Tencent Tube Architecture

Features

Tube cluster uses Zookeeper mainly to store consumer offsets and for Master HA election (legacy; a new design could eliminate ZK dependency).

Broker reports its ID, status, and provided topics/partitions to Master.

Producers and consumers report topic information to Master, which returns broker lists for load balancing.

Broker nodes synchronize state with Master via heartbeats; Master notifies nodes of changes.

Master adopts a primary‑backup mode with Zookeeper for election.

Pull model.

4. Tencent Hippo Architecture

Features

Three controllers (one primary, two backups) collect system node data; failover occurs automatically on primary failure.

Three brokers (one primary, two backups) form a group; primary broker reports heartbeat to controller; on primary failure, election and shuffle happen.

Producer obtains topic‑related broker group IP/port and queue info via heartbeat with controller.

Consumer obtains broker group list and peer consumer info via heartbeat with controller.

Timed lock: if acknowledgment does not arrive within a timeout after pulling data, the lock is released automatically.

Console allows administrators to assign topics and queues to specific broker groups based on collected broker status.

Pull model.

5. Ctrip Herms Architecture

Key points:

Broker join/leave, consumer join/leave, and partition load balancing are managed.

Meta server discovers brokers via Zookeeper, builds routing tables, and assigns them to brokers.

Brokers periodically renew leases with meta server (coordinated through Zookeeper).

Consumers obtain leases from meta server instead of directly contacting Zookeeper, reducing coordination overhead.

Meta server can intervene directly (e.g., when a machine fails).

Long‑polling pull mode; earlier push mode required complex broker code and limited advanced features.

Source: compiled from Tu Yang’s wiki.

Reference: https://blog.csdn.net/lizhitao/article/details/51718156

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureScalabilitymiddlewareMessage QueueReliability
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.