Backend Development 13 min read

Analyzing and Solving Quartz Duplicate Job Scheduling Issues

This article examines why Quartz's job scheduler can produce duplicate executions under high load, explains the internal trigger states, lock mechanisms, and code paths involved, and provides a configuration fix to ensure exclusive trigger acquisition and prevent repeated job runs.

Top Architect

Nov 23, 2021

Analyzing and Solving Quartz Duplicate Job Scheduling Issues

1. Introduction

The company switched to Quartz for task scheduling, handling over two million executions per day, and began to see occasional duplicate job executions without a clear pattern. The article investigates the root cause by analyzing Quartz source code and offers a direct configuration solution.

2. Quartz Basics

Quartz triggers jobs (called fire ) and maintains several key fields: TRIGGER_STATE , PREV_FIRE_TIME , NEXT_FIRE_TIME , and misfire (a missed fire). Two main thread types run: a single scheduling thread that acquires triggers and a pool of worker threads that execute job logic. The relevant database tables are triggers, locks, and fired_triggers.

2.1 Trigger State Diagram

The trigger lifecycle starts at WAITING , moves to ACQUIRED when the scheduler thread pulls it, then to EXECUTING at the actual fire time. After execution it becomes COMPLETE (if no further fires) or returns to WAITING . Errors set the state to ERROR , and manual pauses set it to PAUSED .

3. Investigation

3.1 Distributed Locking

Quartz stores trigger state in a database and supports distributed deployment, so multiple scheduler instances may contend for the same trigger. By default MySQL SELECT statements are non‑locking, which raises the question of how Quartz prevents duplicate fires.

The core method is JobStoreSupport.executeInNonManagedTXLock(), which runs a callback within an optional lock and a transaction.

/**
 * Execute the given callback having acquired the given lock.
 * Depending on the JobStore, the surrounding transaction may be
 * assumed to be already present (managed).
 *
 * @param lockName The name of the lock to acquire, for example
 * "TRIGGER_ACCESS". If null, then no lock is acquired, but the
 * lockCallback is still executed in a transaction.
 */

This guarantees that the callback runs atomically when a lock name is supplied.

3.2 Quartz Scheduling Process

The scheduler thread performs three main steps: pulling pending triggers, firing them, and handing them to the worker pool.

3.2.1 Pulling Triggers

Parameters such as idleWaitTime, availThreadCount, maxBatchSize, batchTimeWindow, and misfireThreshold control how many triggers are fetched and within what time window.

When a trigger is fetched, its state changes from WAITING to ACQUIRED and an entry is added to fired_triggers.

3.2.2 Firing Triggers

Before execution Quartz checks that the trigger state is still ACQUIRED . If not, the trigger is skipped.

protected TriggerFiredBundle triggerFired(Connection conn, OperableTrigger trigger) throws JobPersistenceException {
    JobDetail job;
    Calendar cal = null;
    // Make sure trigger wasn't deleted, paused, or completed...
    try {
        String state = getDelegate().selectTriggerState(conn, trigger.getKey());
        if (!state.equals(STATE_ACQUIRED)) {
            return null;
        }
    } catch (SQLException e) {
        throw new JobPersistenceException("Couldn't select trigger state: " + e.getMessage(), e);
    }
    // ... further processing ...
}

If the state check fails, the trigger is ignored, preventing duplicate execution under normal circumstances.

3.2.3 Handing to Worker Pool

For each successful trigger, Quartz creates a JobRunShell (which implements Thread) and invokes the job's execute() method, wrapping it with listeners and handling any exceptions.

3.3 Root Cause of Duplicate Scheduling

In a distributed environment, the first step (pulling triggers) may be executed without a lock if org.quartz.jobStore.acquireTriggersWithinLock is false. This optimistic locking allows multiple nodes to acquire the same trigger simultaneously, leading to a brief window (often >9 ms) where two nodes progress through the full state cycle, causing duplicate execution.

Log excerpts show that the default configuration does not lock during trigger acquisition, yet the optimistic lock usually prevents duplication; however, edge cases still occur.

3.4 Solution

Enable locking during trigger acquisition by adding the following property to the Quartz configuration:

org.quartz.jobStore.acquireTriggersWithinLock=true

With this setting, the first step acquires a database lock, ensuring that only one scheduler instance can fetch a particular trigger at a time, thereby eliminating the risk of duplicate job runs.

4. Conclusion

The article demonstrates how to trace Quartz's internal scheduling flow, identify the optional lock that can cause duplicate fires, and apply a simple configuration change to enforce exclusive trigger acquisition in distributed deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Quartz Job Scheduling

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.