Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It

The article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naïve cron‑based scans are unsuitable for tens of millions of rows, and presents three progressively robust solutions using Redis expiration, Redis ZSet polling, and message‑queue or time‑wheel architectures.

dbaplus Community
dbaplus Community
dbaplus Community
Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It

Interview Scenario

A candidate was asked how to automatically cancel an order if it remains unpaid for 30 minutes, a classic distributed‑system design problem. The naive answer—using a scheduled task that scans the database every minute—was challenged with three follow‑up questions about scalability, latency, and fault tolerance.

Why Simple Cron Is Inadequate

Timeliness : Periodic polling cannot guarantee second‑level precision.

Database Load : Full‑table scans turn a push‑style workflow into a pull‑heavy one, causing CPU spikes.

Resource Waste : Most scans find no expired orders, yet the task still runs.

The core principle is to avoid polling the database and let expired orders “come to you.”

Solution 1: Redis Expiration Listener (A Trap)

Some suggest storing the order ID as a Redis key with a 30‑minute TTL and relying on the expiration event.

Unreliable : Expiration events are fire‑and‑forget; a service restart or network glitch can cause loss of the event.

Latency : Redis deletes keys lazily, so actual removal may be delayed by minutes.

Solution 2: Redis ZSet + Polling (Recommended Standard)

Use a sorted set where the score holds the exact expiration timestamp and the value holds the order ID.

Production (enqueue) : ZADD delay_queue <timestamp+30min> <OrderId> Consumption (polling) : A background thread runs every second and executes

ZRANGEBYSCORE delay_queue 0 <current_timestamp> LIMIT 0 10

to fetch overdue orders.

Advantages : In‑memory reads/writes, second‑level accuracy, high throughput.

Advanced Pitfall : If a Lua script deletes the entry but the business logic fails, the order could be lost.

Full‑At‑Least‑Once Patch : Use an ACK mechanism—atomically move the order from delay_queue to processing_queue via Lua, process it, then delete from processing_queue. A watchdog thread retries stuck entries, guaranteeing at‑least‑once processing.

Solution 3: Message Queue / Time Wheel (Architect‑Level)

When the order volume reaches hundreds of millions, a single ZSet becomes a bottleneck.

Message Queue : Leverage delayed‑message features of RocketMQ or RabbitMQ.

RocketMQ 4.x only supports fixed delay levels; RocketMQ 5.0 adds arbitrary delays.

RabbitMQ’s native TTL + dead‑letter queue suffers from “head‑blocking”; the rabbitmq_delayed_message_exchange plugin resolves it.

Hashed Wheel Timer : A 60‑slot circular buffer where each slot represents one second. An order expiring in 30 minutes is placed in the slot (current_slot + 1800) % 60.

Pros: Pure memory operation, extremely fast.

Cons: State is lost on process restart.

Production practice: Combine Redis ZSet for persistence with an in‑memory wheel for high‑frequency triggers.

Final Defensive Checklist

Prevent duplicate cancellation across multiple nodes by using Lua scripts for atomic ZRANGE + ZREM and ensuring idempotent cancel APIs.

Shard the ZSet when it grows large: distribute orders across delay_queue_0 … delay_queue_9 based on a hash of the order ID, and run parallel pollers to multiply throughput.

Provide a fallback offline scan (e.g., a T+1 job on a replica) to catch any missed cancellations.

Interview Template (Ready to Recite)

“Database polling for order timeout is unacceptable at high concurrency. My design separates storage from computation and decouples via middleware. I would choose Redis ZSet as the lightweight delay queue, store the expiration timestamp as the score, and run a per‑second poller that atomically moves due orders to a processing queue using Lua. The cancel service is idempotent, and I add an ACK‑based processing queue plus a low‑frequency fallback scan for ultimate consistency. For extreme scale, I would switch to RocketMQ 5.0’s arbitrary delayed messages or combine Redis ZSet with an in‑memory time wheel.”

Takeaway

Use Redis for fast, precise delayed tasks, augment with idempotent processing and fallback mechanisms, and scale to billions of orders with sharding, message queues, or time‑wheel algorithms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsRedisMessage QueueZSetTime WheelOrder CancellationDelayed Task
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.