How to Build a Scalable Delayed Queue with Redis and Java

This article explains why delayed queues are needed for scenarios like unpaid orders or auto‑generated comments, compares built‑in and third‑party solutions, and provides a detailed design and implementation guide for a Redis‑based delayed‑queue service, including version‑1.0 features, version‑2.0 optimizations, and multi‑node deployment.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How to Build a Scalable Delayed Queue with Redis and Java

Background

Business processes often require actions after a specific delay, such as cancelling an unpaid order after 30 minutes, posting a default comment if a user does not comment within 48 hours, or aborting a delivery order that has not been dispatched in time.

Existing delayed‑queue mechanisms

java.util.concurrent.DelayQueue – simple JDK implementation; messages exist only in JVM memory, so persistence and distributed operation are not supported.

RocketMQ delayed queue – provides persistence and distribution but only a fixed set of delay levels, limiting granularity.

RabbitMQ delayed queue (TTL + DLX) – also persistent and distributed, yet messages with the same delay must share a single queue.

Design requirements for a custom delayed‑queue service

Reliable message storage with at‑least‑once consumption.

Real‑time retrieval of expired messages with minimal latency.

High availability and fault tolerance across multiple nodes.

Redis‑based implementation – version 1.0

Key features: persistent storage, near‑real‑time processing (bounded by the task interval), ability to delete specific messages, and high availability.

Architecture

Messages Pool : a Redis HASH where each field is a message ID and the value is the full payload. HSET / HGET provide O(1) access and the hash grows gradually via rehashing.

Delayed Queue : 16 ordered ZSET structures (horizontally scalable). Each member stores a message ID; the score is the expiration timestamp. Multiple ZSETs reduce the scan range per query.

Timed task : a background job periodically scans each ZSET for entries whose score ≤ current time, sends the message to the downstream MQ, and removes it from both the ZSET and the HASH.

Message schema

tags

– MQ tags used after expiration. keys – MQ keys used after expiration. body – payload delivered to the consumer. delayTime – relative delay duration (either this or expectDate must be set). expectDate – absolute timestamp for delivery.

Processing flow (v1.0)

A timed task runs every minute, queries each ZSET for expired messages, publishes them to the MQ, and then removes the corresponding entries from the HASH and ZSET. Offsets are recorded so that failed deliveries can be retried without loss.

Version 2.0 improvements

Version 1.0 relied on a 1‑minute polling task. Version 2.0 replaces this with a Java Lock using await / signal to achieve near‑real‑time delivery and lower latency.

Multi‑node deployment

Pull job : a dedicated thread per ZSET that extracts expired messages. Only one pull job handles a given queue, ensuring exclusive access.

Worker : processes messages handed over by the pull job, allowing the pull job to continue scanning without waiting for the entire batch to finish.

Zookeeper coordination : daemon threads watch ZK nodes for queue creation/deletion and re‑assign queues when nodes join or leave the cluster.

Main workflow

When a service instance starts, it registers itself in Zookeeper, obtains the list of queues assigned to it, and launches background listeners. For each assigned queue, the pull job queries the ZSET for expired entries. If an expired message is found, it is handed to a worker for immediate processing. If no expired entry exists, the pull job reads the score of the last member in the ZSET; the difference between that score and the current system time is used as the wait time, after which the thread calls await. Offsets are persisted so that a failed delivery can be retried without losing the message. When nodes are added or removed, Zookeeper triggers re‑balancing: pull jobs for stale queues are cancelled and new pull jobs are created for the newly assigned queues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsJavaRedisMessage QueueRabbitMQRocketMQdelayed queue
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.