Backend Development 16 min read

Mastering Order Auto‑Cancellation: Timers, Queues, and Distributed Schedulers

This article examines practical strategies for automatically canceling unpaid orders, comparing simple timer‑based polling, cluster‑ready schedulers like Quartz, Elastic‑Job, and XXL‑JOB, as well as delayed‑message approaches using RocketMQ and Redis, and offers best‑practice guidelines for concurrency, monitoring, and fault tolerance.

Sanyou's Java Diary

Mar 14, 2024

Mastering Order Auto‑Cancellation: Timers, Queues, and Distributed Schedulers

Hello everyone, I am SanYou.

When an order is placed in the Meituan app but not paid immediately, a countdown appears on the order detail page; if the payment time expires, the order is automatically cancelled.

Many generic solutions exist, but this article dives deep into designing an order‑timeout auto‑cancellation feature that fits real‑world business scenarios.

1. Timer Task Solution

We first consider the natural timer‑task approach.

Process flow:

Every 30 seconds query the database for the latest N unpaid orders.

Iterate the retrieved list, check if the current time minus the order creation time exceeds the payment timeout, and cancel the order if it has timed out.

The timer‑task implementation is straightforward, but it adds periodic I/O pressure on the database, which can become a performance bottleneck under high order volume.

The timer‑task solution consists of two layers: Scheduling Layer and Business Logic Layer .

Timer‑task implementations can be divided into single‑node and clustered versions.

2. Timer Task Solution: Single‑Node

We can easily implement timers using Timer, ScheduledExecutorService, or Quartz.

However, the single‑node approach is not recommended. For example, if application A schedules three tasks A, B, C with Quartz and is deployed in a cluster, multiple instances may execute the same task simultaneously.

Locking can mitigate this risk, as shown below:

Nevertheless, this method is inelegant, and the scheduler often runs empty tasks. Ideally, tasks A, B, C should be evenly distributed across different instances of application A.

Next, I will introduce three cluster‑based timer solutions I have personally used.

3. Timer Task Solution: Clustered

3.1 Quartz + JDBCJobStore

Quartz supports a cluster mode that requires 11 tables in the database, introducing some intrusion to the business system.

In a lottery company I served, the order scheduling center used Quartz’s cluster mode to handle a daily volume of millions of orders.

Key point: Quartz’s cluster mode relies on pessimistic locks in the underlying database, so performance degrades when many high‑frequency tasks run concurrently.

3.2 Elastic‑Job

Elastic‑Job is a lightweight, decentralized solution that provides distributed task coordination via a JAR.

Internally it still uses Quartz, but leverages Zookeeper for load‑balanced task assignment to Quartz Scheduler containers within the application.

Example: Application A has five tasks (A‑E). Task E is split into four subtasks. The application is deployed on two machines; Zookeeper distributes the tasks, and each Quartz Scheduler executes its assigned tasks.

Compared with Quartz cluster mode, Elastic‑Job offers higher scalability and excellent performance due to in‑memory job storage.

However, its console is crude, and task control relies on Zookeeper nodes, making it less flexible. Therefore, Elastic‑Job is more of a framework than a full‑featured scheduling platform.

3.3 Task Scheduling Platform (XXL‑JOB)

I strongly endorse the task‑scheduling‑platform approach. XXL‑JOB is the most widely used distributed scheduling platform.

The business system and the scheduling platform are deployed separately. The platform configures applications and their tasks; when a task triggers, the platform calls the business system, which executes the task and reports the result back.

Both sides can scale horizontally for high availability, and the platform supports flexible strategies such as retry mechanisms and broadcast mode.

XXL‑JOB still uses database pessimistic locks, though it mitigates some issues with time‑wheel optimization; performance bottlenecks may still appear.

Many companies (e.g., Shenzhou Ride‑Sharing, Meituan) have built their own scheduling platforms, which are well‑suited for large‑scale, multi‑team coordination.

4. Delayed‑Message Solution

Delayed messages provide an elegant pattern: after order creation, the order service sends a delayed message to a queue. When the delay expires, the consumer receives the message, checks the order status, and cancels it if unpaid.

4.1 RocketMQ

RocketMQ 4.x delayed‑message code example:

Message msg = new Message();
msg.setTopic("TopicA");
msg.setTags("Tag");
msg.setBody("this is a delay message".getBytes());
// set delay level 5 (1 minute)
msg.setDelayTimeLevel(5);
producer.send(msg);

RocketMQ 4.x supports 18 delay levels configured via messageDelayLevel on the broker.

RocketMQ 5.x allows arbitrary delay times with three new APIs for specifying delay or schedule timestamps.

If the team has strong infrastructure capabilities, I highly recommend using RocketMQ 5.x delayed messages.

4.2 Self‑Developed Delay Service

RocketMQ 4.x only supports fixed delay levels. Companies like Kuaishou and Didi built separate Delay Servers to schedule delayed messages.

The structure keeps the original message interface: the order service still sends a message to RocketMQ, but a different topic is used, and the Delay Server consumes it. This approach reuses existing sending logic while adding a delay field.

A custom Delay Server can work with both RocketMQ and Kafka, offering high performance and flexibility.

4.3 Redis Delayed Queue

Redis provides a lightweight delayed‑queue solution; the mature open‑source implementation is Redisson.

Two collections are defined:

1. zset : producers add tasks with the task ID as the value and the execution timestamp as the score.

2. list : a guard thread moves expired tasks from the zset to the list; consumers pop tasks from the list and execute them.

Note: Redis is not a true message queue; there is a small chance of message loss.

5. Best Practices

5.1 Concurrency Mnemonic: Lock‑Check‑Update

When using timers or delayed messages, concurrent execution (e.g., duplicate consumption, retry) is inevitable.

Apply the “one lock, two checks, three updates” rule:

Lock the order to be processed.

Check whether the order’s status has already been updated.

If not updated, perform the status change and related business logic; otherwise, skip.

Release the lock.

5.2 Fallback Awareness + Monitoring Configuration

Even with solid designs, issues like message loss can occur.

Maintain a fallback mechanism: run a nightly batch job to cancel any orders that remain unpaid.

Monitoring is essential. Separate monitoring into system‑level (performance, method availability, call counts) and business‑level (process health, alerts when scheduled tasks miss executions).

Performance monitoring tracks metrics such as TP99, TP999, AVG, and MAX to guide optimization.

Business monitoring collects workflow data, triggers alerts when expectations are not met, and aggregates data for unified visualization.

6. Summary

This article summarizes two main streams for order‑timeout auto‑cancellation:

1. Timer Tasks – can be single‑node (generally not recommended) or clustered. Clustered options include Quartz + JDBCJobStore, Elastic‑Job, and XXL‑JOB, with XXL‑JOB being the preferred choice.

2. Delayed Messages – options are RocketMQ, self‑built delay services, and Redis delayed queues. For teams with strong infrastructure, RocketMQ or a custom delay service is recommended; for lightweight needs, Redis works well.

Regardless of the approach, ensure stability by following the concurrency mnemonic (lock‑check‑update) and implementing fallback awareness with proper monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Architecture task scheduling delayed messages Order Cancellation Distributed Scheduler

Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.