Why Quartz Jobs Were Duplicated and How to Fix It

This article analyzes a Quartz 2.3.0 job‑scheduling issue where millions of daily triggers caused occasional duplicate executions, explains the underlying lock mechanisms and state transitions, and provides a simple configuration change to prevent the problem in distributed environments.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
Why Quartz Jobs Were Duplicated and How to Fix It

Introduction

The company switched to Quartz for task scheduling, handling over two million executions per day. As the load grew, occasional duplicate job executions appeared without a clear pattern. The article examines Quartz 2.3.0 with JDBC job store to uncover the root cause and offers a direct fix.

Preparation

Before diving into the code, it is essential to understand Quartz's purpose and basic concepts such as fire , TRIGGER_STATE , PREV_FIRE_TIME , NEXT_FIRE_TIME and misfire . Quartz runs two main thread groups: one for acquiring triggers and another for executing job logic. The framework uses three core tables: triggers (stores trigger timing and state), locks (supports distributed locking) and fired_triggers (records currently firing triggers).

Trigger State Overview

Trigger state diagram
Trigger state diagram

Initially a trigger is in WAITING . When the scheduler fetches it, the state becomes ACQUIRED . After the scheduled moment, it moves to EXECUTING . If the job finishes normally, the state changes to COMPLETE ; otherwise it may revert to WAITING for the next cycle or become ERROR or PAUSED depending on the situation.

Investigation

3.1 Distributed State Access

Quartz stores trigger state in the database and supports multiple scheduler instances. When several instances run, they compete for the same trigger. By default MySQL SELECT statements are non‑locking, so concurrent acquisition could lead to duplicate execution. Quartz solves this with the executeInNonManagedTXLock() method.

public List<OperableTrigger> acquireNextTriggers(final long noLaterThan, final int maxCount, final long timeWindow) throws JobPersistenceException {<br/>    String lockName;<br/>    if (isAcquireTriggersWithinLock() || maxCount > 1) {<br/>        lockName = LOCK_TRIGGER_ACCESS;<br/>    } else {<br/>        lockName = null;<br/>    }<br/>    return executeInNonManagedTXLock(lockName, new TransactionCallback<List<OperableTrigger>>(){<br/>        public List<OperableTrigger> execute(Connection conn) throws JobPersistenceException {<br/>            return acquireNextTrigger(conn, noLaterThan, maxCount, timeWindow);<br/>        }<br/>    }, new TransactionValidator<List<OperableTrigger>>(){ /* omitted */ });<br/>}

The method’s Javadoc explains that the lockName parameter determines whether a lock is acquired; a null value means the callback runs without locking but still inside a transaction.

Debugging shows isAcquireTriggersWithinLock() returns false</strong>, so <code>lockName is null and the first trigger‑acquisition step runs without a lock.

Lock acquisition debug
Lock acquisition debug

Quartz relies on optimistic locking: multiple threads may read the same trigger, but only the thread that successfully changes the state from WAITING to ACQUIRED proceeds. If another thread sees a different state, it aborts.

However, a narrow time window (e.g., >9 ms) between the WAITING→ACQUIRED transition and the subsequent EXECUTING step can allow another scheduler instance to complete the whole cycle, causing duplicate execution. The following diagram illustrates this ABA problem:

Duplicate scheduling cause
Duplicate scheduling cause

3.4 Solution

Enable locking during trigger acquisition by adding the configuration property:

org.quartz.jobStore.acquireTriggersWithinLock=true

This forces the first step to acquire LOCK_TRIGGER_ACCESS, preventing multiple instances from fetching the same trigger simultaneously and eliminating duplicate scheduling.

Reflection

Learning a large codebase requires first grasping the overall architecture before diving into source files; incremental exploration speeds up debugging.

Questioning assumptions is crucial—seeing a lock‑related method does not guarantee it is always used.

Logging is indispensable for reproducing and proving subtle concurrency issues such as ABA.

Even complex frameworks like Quartz can be debugged without reading every line; focused investigation and good techniques reduce resolution time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DebuggingJavadistributed lockingQuartzJob Scheduling
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.