Analyzing and Solving Quartz Duplicate Job Scheduling Issues
This article examines why Quartz's job scheduler can produce duplicate executions under high load, explains the internal trigger states, lock mechanisms, and code paths involved, and provides a configuration fix to ensure exclusive trigger acquisition and prevent repeated job runs.
1. Introduction
The company switched to Quartz for task scheduling, handling over two million executions per day, and began to see occasional duplicate job executions without a clear pattern. The article investigates the root cause by analyzing Quartz source code and offers a direct configuration solution.
2. Quartz Basics
Quartz triggers jobs (called fire ) and maintains several key fields: TRIGGER_STATE , PREV_FIRE_TIME , NEXT_FIRE_TIME , and misfire (a missed fire). Two main thread types run: a single scheduling thread that acquires triggers and a pool of worker threads that execute job logic. The relevant database tables are triggers , locks , and fired_triggers .
2.1 Trigger State Diagram
The trigger lifecycle starts at WAITING , moves to ACQUIRED when the scheduler thread pulls it, then to EXECUTING at the actual fire time. After execution it becomes COMPLETE (if no further fires) or returns to WAITING . Errors set the state to ERROR , and manual pauses set it to PAUSED .
3. Investigation
3.1 Distributed Locking
Quartz stores trigger state in a database and supports distributed deployment, so multiple scheduler instances may contend for the same trigger. By default MySQL SELECT statements are non‑locking, which raises the question of how Quartz prevents duplicate fires.
The core method is JobStoreSupport.executeInNonManagedTXLock() , which runs a callback within an optional lock and a transaction.
/**
* Execute the given callback having acquired the given lock.
* Depending on the JobStore, the surrounding transaction may be
* assumed to be already present (managed).
*
* @param lockName The name of the lock to acquire, for example
* "TRIGGER_ACCESS". If null, then no lock is acquired, but the
* lockCallback is still executed in a transaction.
*/This guarantees that the callback runs atomically when a lock name is supplied.
3.2 Quartz Scheduling Process
The scheduler thread performs three main steps: pulling pending triggers, firing them, and handing them to the worker pool.
3.2.1 Pulling Triggers
Parameters such as idleWaitTime , availThreadCount , maxBatchSize , batchTimeWindow , and misfireThreshold control how many triggers are fetched and within what time window.
When a trigger is fetched, its state changes from WAITING to ACQUIRED and an entry is added to fired_triggers .
3.2.2 Firing Triggers
Before execution Quartz checks that the trigger state is still ACQUIRED . If not, the trigger is skipped.
protected TriggerFiredBundle triggerFired(Connection conn, OperableTrigger trigger) throws JobPersistenceException {
JobDetail job;
Calendar cal = null;
// Make sure trigger wasn't deleted, paused, or completed...
try {
String state = getDelegate().selectTriggerState(conn, trigger.getKey());
if (!state.equals(STATE_ACQUIRED)) {
return null;
}
} catch (SQLException e) {
throw new JobPersistenceException("Couldn't select trigger state: " + e.getMessage(), e);
}
// ... further processing ...
}If the state check fails, the trigger is ignored, preventing duplicate execution under normal circumstances.
3.2.3 Handing to Worker Pool
For each successful trigger, Quartz creates a JobRunShell (which implements Thread ) and invokes the job's execute() method, wrapping it with listeners and handling any exceptions.
3.3 Root Cause of Duplicate Scheduling
In a distributed environment, the first step (pulling triggers) may be executed without a lock if org.quartz.jobStore.acquireTriggersWithinLock is false. This optimistic locking allows multiple nodes to acquire the same trigger simultaneously, leading to a brief window (often >9 ms) where two nodes progress through the full state cycle, causing duplicate execution.
Log excerpts show that the default configuration does not lock during trigger acquisition, yet the optimistic lock usually prevents duplication; however, edge cases still occur.
3.4 Solution
Enable locking during trigger acquisition by adding the following property to the Quartz configuration:
org.quartz.jobStore.acquireTriggersWithinLock=trueWith this setting, the first step acquires a database lock, ensuring that only one scheduler instance can fetch a particular trigger at a time, thereby eliminating the risk of duplicate job runs.
4. Conclusion
The article demonstrates how to trace Quartz's internal scheduling flow, identify the optional lock that can cause duplicate fires, and apply a simple configuration change to enforce exclusive trigger acquisition in distributed deployments.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.