How to Build a Scalable Delayed Queue with Redis and Java
This article explains why delayed queues are needed for scenarios like unpaid orders or auto‑generated comments, compares built‑in and third‑party solutions, and provides a detailed design and implementation guide for a Redis‑based delayed‑queue service, including version‑1.0 features, version‑2.0 optimizations, and multi‑node deployment.
Background
Business processes often require actions after a specific delay, such as cancelling an unpaid order after 30 minutes, posting a default comment if a user does not comment within 48 hours, or aborting a delivery order that has not been dispatched in time.
Existing delayed‑queue mechanisms
java.util.concurrent.DelayQueue – simple JDK implementation; messages exist only in JVM memory, so persistence and distributed operation are not supported.
RocketMQ delayed queue – provides persistence and distribution but only a fixed set of delay levels, limiting granularity.
RabbitMQ delayed queue (TTL + DLX) – also persistent and distributed, yet messages with the same delay must share a single queue.
Design requirements for a custom delayed‑queue service
Reliable message storage with at‑least‑once consumption.
Real‑time retrieval of expired messages with minimal latency.
High availability and fault tolerance across multiple nodes.
Redis‑based implementation – version 1.0
Key features: persistent storage, near‑real‑time processing (bounded by the task interval), ability to delete specific messages, and high availability.
Architecture
Messages Pool : a Redis HASH where each field is a message ID and the value is the full payload. HSET / HGET provide O(1) access and the hash grows gradually via rehashing.
Delayed Queue : 16 ordered ZSET structures (horizontally scalable). Each member stores a message ID; the score is the expiration timestamp. Multiple ZSETs reduce the scan range per query.
Timed task : a background job periodically scans each ZSET for entries whose score ≤ current time, sends the message to the downstream MQ, and removes it from both the ZSET and the HASH.
Message schema
tags– MQ tags used after expiration. keys – MQ keys used after expiration. body – payload delivered to the consumer. delayTime – relative delay duration (either this or expectDate must be set). expectDate – absolute timestamp for delivery.
Processing flow (v1.0)
A timed task runs every minute, queries each ZSET for expired messages, publishes them to the MQ, and then removes the corresponding entries from the HASH and ZSET. Offsets are recorded so that failed deliveries can be retried without loss.
Version 2.0 improvements
Version 1.0 relied on a 1‑minute polling task. Version 2.0 replaces this with a Java Lock using await / signal to achieve near‑real‑time delivery and lower latency.
Multi‑node deployment
Pull job : a dedicated thread per ZSET that extracts expired messages. Only one pull job handles a given queue, ensuring exclusive access.
Worker : processes messages handed over by the pull job, allowing the pull job to continue scanning without waiting for the entire batch to finish.
Zookeeper coordination : daemon threads watch ZK nodes for queue creation/deletion and re‑assign queues when nodes join or leave the cluster.
Main workflow
When a service instance starts, it registers itself in Zookeeper, obtains the list of queues assigned to it, and launches background listeners. For each assigned queue, the pull job queries the ZSET for expired entries. If an expired message is found, it is handed to a worker for immediate processing. If no expired entry exists, the pull job reads the score of the last member in the ZSET; the difference between that score and the current system time is used as the wait time, after which the thread calls await. Offsets are persisted so that a failed delivery can be retried without losing the message. When nodes are added or removed, Zookeeper triggers re‑balancing: pull jobs for stale queues are cancelled and new pull jobs are created for the newly assigned queues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
