Backend Development 22 min read

Ctrip's Hermes Asynchronous Messaging System: Architecture, Evolution, and High‑Performance Practices

The article presents a detailed overview of Ctrip's Hermes asynchronous messaging system, describing its architectural evolution from a simple Mongo‑based queue to a broker‑centric design with MySQL and Kafka back‑ends, and explains optimization techniques for single‑node performance, clustering, lease‑based management, and reliable delivery.

Architecture Digest
Architecture Digest
Architecture Digest
Ctrip's Hermes Asynchronous Messaging System: Architecture, Evolution, and High‑Performance Practices

In this article, Gu Qing shares the design and implementation experience of Ctrip's new asynchronous messaging system, Hermes, focusing on architectural decisions and practical optimizations.

Advantages of Message Queues – Message queues decouple services, enable asynchronous processing, absorb traffic spikes, support fan‑out scenarios, and improve system resilience.

Basic MQ Model – The classic Queue model delivers a message to a single consumer, while the Topic model allows multiple consumers, and consumer groups enable load‑balanced consumption within a group.

Evolution of Ctrip's MQ Architecture

Version 1.0 (three‑four years ago) stored messages directly in MongoDB without a broker, supporting only simple queue and topic semantics and lacking consumer groups, which led to high client upgrade costs and heavy DB coordination.

Version 2 introduced a broker (master‑slave) that coordinated consumers via MongoDB heartbeats, allowing better scaling and reducing client‑side changes.

The current architecture (Version 4) adds a meta‑server for cluster coordination, uses both MySQL and Kafka as storage back‑ends, and separates producers, brokers, and consumers more cleanly.

Two Types of Messages

Kafka‑backed messages provide high throughput but lack features such as redelivery, priority, and filtering; therefore they are used for large‑volume, less‑critical data. Critical business data is stored in MySQL, which offers richer queue features and easier troubleshooting.

Building an Efficient MQ

Optimization starts with single‑node performance (fast writes, wide channels, low latency) using simple table schemas (only primary‑key index), batch inserts, and memory‑based caching. Then the design scales to a cluster by adding brokers, partitioning topics, and employing long‑polling to reduce consumer polling overhead.

From Single Node to Cluster

Cluster scaling involves load‑balancing brokers, partitioning topics to preserve order within a partition, and assigning partitions to specific brokers for efficient processing.

Lease‑Based Cluster Management

A meta‑server issues time‑limited leases to brokers and consumers, defining which partition each broker manages. Leases are renewed periodically; when a lease expires, the meta‑server rebalances the partition to another broker or consumer, simplifying coordination without heavy reliance on ZooKeeper.

Summary

The Hermes system demonstrates that high‑performance messaging requires fast write paths, low‑latency delivery, partition‑aware routing, and a simple lease‑based management layer to handle broker and consumer dynamics, while balancing feature richness (MySQL) against raw throughput (Kafka).

distributed systemsKafkaMySQLMessage QueueBrokerHermesCtrip
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.