Design and Implementation of Delayed Message Queues in Distributed Systems
This article surveys common delayed message implementations—including external storage, RocksDB, Redis, and open‑source MQs like RocketMQ, Pulsar, and QMQ—analyzing their architectures, advantages, drawbacks, and practical considerations for building reliable distributed asynchronous messaging systems.
Delayed messages (or scheduled messages) are used in distributed asynchronous messaging scenarios where a producer wants a message to be consumed at a specific future time rather than immediately.
The article explores several implementation schemes and discusses their pros and cons.
1. External Storage Based Schemes
These schemes separate the MQ from a dedicated delay module, storing messages in an external system (e.g., a database) until they expire, then delivering them to the MQ. A typical MySQL table definition is:
CREATE TABLE `delay_msg` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`delivery_time` DATETIME NOT NULL COMMENT '投递时间',
`payloads` blob COMMENT '消息内容',
PRIMARY KEY (`id`),
KEY `time_index` (`delivery_time`)
);Advantages: simple to implement. Drawbacks: B+Tree indexes are not optimal for high‑write message workloads.
2. RocksDB Based Scheme
RocksDB, with its LSM‑tree, handles massive writes better. Projects like DDMQ’s Chronos use RocksDB to store delayed messages, scanning them with a timer thread for delivery. The main drawback is the added complexity of handling data replication and fault‑tolerance.
3. Redis Based Scheme
Redis ZSETs are used to implement delayed queues. A typical design includes a Messages Pool (Hash), multiple Delayed Queues (ZSETs), and worker threads that scan for expired messages. Advantages: O(log n) insertion with in‑memory speed; drawbacks include potential duplicate processing across nodes and the need for distributed locks.
4. Open‑Source MQ Implementations
RocketMQ supports 18 fixed delay levels (e.g., 1 s, 5 s, …, 2 h). Messages are stored in a special topic and dispatched to the real topic when the level expires. Advantages: low overhead, ordered delivery per level. Drawbacks: inflexible level configuration and larger commit logs.
Pulsar supports arbitrary delay times by storing messages in the target topic and maintaining a priority queue in off‑heap memory. While flexible, it incurs high memory usage per subscription, longer recovery times after failures, and increased storage consumption for long‑delay messages.
QMQ offers arbitrary delay times using a two‑level hierarchical time wheel: a disk‑based wheel with hour‑granularity logs and an in‑memory wheel with 500 ms slots. This design provides O(1) operations, supports multi‑year delays, and isolates delayed messages from normal traffic.
5. Issues with Timer‑Thread Scanning
Scanning threads can waste resources for low message volumes and cause inaccurate timing for high volumes. An improvement is to use a wait‑notify mechanism similar to JDK Timer, waking only when the next message’s delivery time approaches.
Conclusion
The article summarizes the common delayed‑message solutions in the industry, comparing their strengths and weaknesses, and provides insights for selecting an appropriate design based on workload, latency requirements, and operational constraints.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.