Backend Development 8 min read

Design and Implementation of a Redis‑Based Delayed Queue Service

This article explains the business scenarios that require delayed processing, compares several delay‑queue solutions such as Java's DelayQueue, RocketMQ and RabbitMQ, and then details a custom Redis‑backed delayed‑queue architecture (1.0 and 2.0 versions) with Zookeeper coordination, pull‑job and worker threads for high‑availability and real‑time message delivery.

Top Architect
Top Architect
Top Architect
Design and Implementation of a Redis‑Based Delayed Queue Service

Background – In many business processes, actions need to be performed after a certain delay, such as cancelling unpaid orders after 30 minutes, auto‑generating default comments after 48 hours, or timing out delivery orders. Simple polling of the database works for small data volumes but becomes resource‑intensive at scale, prompting the use of delayed queues.

Types of Delay Queues – The article lists three common implementations: java.util.concurrent.DelayQueue (in‑process, no persistence), RocketMQ delayed queue (persistent, distributed but limited to predefined delay levels), and RabbitMQ delayed queue (TTL + DLX, persistent but requires same‑delay messages to share a queue).

Key Design Considerations – When building a custom delayed‑queue service, one must address message storage, real‑time retrieval of expired messages, and high availability.

Redis‑Based Implementation (Version 1.0)

Features include message reliability (at‑least‑once delivery), acceptable real‑time latency, support for explicit message removal, and high availability. The overall structure consists of:

Messages Pool – a Redis HASH storing each delayed message (key: message ID, value: message data).

Delayed Queue – sixteen ordered ZSET queues, each holding message IDs with scores representing expiration timestamps.

Timed Task – a scheduled job that scans queues for expired messages.

Message schema includes tags , keys , body , delayTime (or expectDate ), and other metadata.

Version 2.0 Improvements – Replaces the 1‑minute polling task with a Java Lock await/signal mechanism, enabling real‑time dispatch of expired messages with lower latency.

Multi‑node deployment introduces:

Pull Job – a dedicated thread per queue that fetches expired messages.

Worker – processes fetched messages, improving real‑time handling.

Zookeeper Coordination – manages queue assignment and rebalancing when nodes join or leave.

Main Process Flow – On service start, the instance registers with Zookeeper, obtains its assigned queues, and launches pull‑job threads. Each pull‑job checks its queue for expired messages; if found, they are handed to a worker for processing, otherwise the job waits based on the next message's score. Offsets are tracked to avoid message loss on failures, and pull‑jobs are recreated when the cluster topology changes.

The article concludes with an invitation for readers to discuss, share opinions, and join a community of senior architects.

backenddistributed systemsJavaRedisZookeeperMessage QueueDelayed Queue
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.