Backend Development 13 min read

Why Quartz Jobs Were Duplicated and How to Fix It

This article analyzes a Quartz 2.3.0 job‑scheduling issue where millions of daily triggers caused occasional duplicate executions, explains the underlying lock mechanisms and state transitions, and provides a simple configuration change to prevent the problem in distributed environments.

Java Interview Crash Guide

Nov 28, 2021

Why Quartz Jobs Were Duplicated and How to Fix It

Introduction

The company switched to Quartz for task scheduling, handling over two million executions per day. As the load grew, occasional duplicate job executions appeared without a clear pattern. The article examines Quartz 2.3.0 with JDBC job store to uncover the root cause and offers a direct fix.

Preparation

Before diving into the code, it is essential to understand Quartz's purpose and basic concepts such as fire , TRIGGER_STATE , PREV_FIRE_TIME , NEXT_FIRE_TIME and misfire . Quartz runs two main thread groups: one for acquiring triggers and another for executing job logic. The framework uses three core tables: triggers (stores trigger timing and state), locks (supports distributed locking) and fired_triggers (records currently firing triggers).

Trigger State Overview

Initially a trigger is in WAITING . When the scheduler fetches it, the state becomes ACQUIRED . After the scheduled moment, it moves to EXECUTING . If the job finishes normally, the state changes to COMPLETE ; otherwise it may revert to WAITING for the next cycle or become ERROR or PAUSED depending on the situation.

Investigation

3.1 Distributed State Access

Quartz stores trigger state in the database and supports multiple scheduler instances. When several instances run, they compete for the same trigger. By default MySQL SELECT statements are non‑locking, so concurrent acquisition could lead to duplicate execution. Quartz solves this with the executeInNonManagedTXLock() method.

public List<OperableTrigger> acquireNextTriggers(final long noLaterThan, final int maxCount, final long timeWindow) throws JobPersistenceException {<br/>    String lockName;<br/>    if (isAcquireTriggersWithinLock() || maxCount > 1) {<br/>        lockName = LOCK_TRIGGER_ACCESS;<br/>    } else {<br/>        lockName = null;<br/>    }<br/>    return executeInNonManagedTXLock(lockName, new TransactionCallback<List<OperableTrigger>>(){<br/>        public List<OperableTrigger> execute(Connection conn) throws JobPersistenceException {<br/>            return acquireNextTrigger(conn, noLaterThan, maxCount, timeWindow);<br/>        }<br/>    }, new TransactionValidator<List<OperableTrigger>>(){ /* omitted */ });<br/>}

The method’s Javadoc explains that the lockName parameter determines whether a lock is acquired; a null value means the callback runs without locking but still inside a transaction.

Debugging shows isAcquireTriggersWithinLock() returns false</strong>, so <code>lockName is null and the first trigger‑acquisition step runs without a lock.

Quartz relies on optimistic locking: multiple threads may read the same trigger, but only the thread that successfully changes the state from WAITING to ACQUIRED proceeds. If another thread sees a different state, it aborts.

However, a narrow time window (e.g., >9 ms) between the WAITING→ACQUIRED transition and the subsequent EXECUTING step can allow another scheduler instance to complete the whole cycle, causing duplicate execution. The following diagram illustrates this ABA problem:

3.4 Solution

Enable locking during trigger acquisition by adding the configuration property:

org.quartz.jobStore.acquireTriggersWithinLock=true

This forces the first step to acquire LOCK_TRIGGER_ACCESS, preventing multiple instances from fetching the same trigger simultaneously and eliminating duplicate scheduling.

Reflection

Learning a large codebase requires first grasping the overall architecture before diving into source files; incremental exploration speeds up debugging.

Questioning assumptions is crucial—seeing a lock‑related method does not guarantee it is always used.

Logging is indispensable for reproducing and proving subtle concurrency issues such as ABA.

Even complex frameworks like Quartz can be debugged without reading every line; focused investigation and good techniques reduce resolution time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

debugging Java distributed locking Quartz Job Scheduling

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.