Cloud Native 12 min read

Implementing Graceful Shutdown for XXL‑JOB in Cloud‑Native Deployments

This guide explains why graceful shutdown is essential for XXL‑JOB scheduled tasks, analyzes the executor‑admin interaction chain, identifies interruption issues, and provides a three‑step solution with code examples and configuration tips for cloud‑native environments.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Implementing Graceful Shutdown for XXL‑JOB in Cloud‑Native Deployments

When high‑frequency scheduled tasks run in an application, a deployment restart can interrupt task execution, leading to incomplete business data. To ensure smooth operation during rolling releases, a graceful shutdown mechanism for XXL‑JOB is required.

XXL‑JOB Architecture and Problem Points

The open‑source XXL‑JOB system consists of two modules: xxl‑job admin and xxl‑job executor . The execution flow involves the executor registering its heartbeat, the admin storing registration info in xxl_job_registry, and the scheduler reading the address_list from xxl_job_group to dispatch tasks.

Two main issues were identified:

Node offload delay : When an executor goes offline, the address list is not refreshed immediately, causing the scheduler to still assign tasks to the offline node.

Forced task interruption : The executor’s shutdown routine abruptly interrupts running JobThread instances and discards queued tasks, marking them as failed.

The following images illustrate the interaction chain and the problematic points:

Graceful Shutdown Process (Three Steps)

The solution consists of three core steps: offload the node, wait for running tasks to finish, then terminate the process.

Step 1 – Node Offload (摘流)

During XxlJobExecutor#destroy, the method stopEmbedServer() sends a registryRemove request, removing the node from xxl_job_registry. However, the address_list in xxl_job_group is not updated, so the node remains in the scheduler’s view. To fix this, modify either:

Update JobRegistryHelper.registryRemove to refresh xxl_job_group.address_list (or implement the refresh in freshGroupRegistryInfo).

Adjust XxlJobTrigger#trigger() to read the address list directly from xxl_job_registry instead of the stale group table.

After applying one of these changes, the node is truly offloaded.

Step 2 – Wait for Running Tasks

Enhance the destroy method to block until all task threads have completed. The following code demonstrates the waiting loop:

public void destroy() {
    // destroy executor‑server
    stopEmbedServer();

    // destroy jobThreadRepository
    if (jobThreadRepository.size() > 0) {
        List keyList = new ArrayList(jobThreadRepository.keySet());
        for (int i = 0; i < keyList.size(); i++) {
            JobThread jobThread = jobThreadRepository.get(keyList.get(i));
            // wait for all queued tasks to finish
            while (jobThread != null && jobThread.isRunningOrHasQueue()) {
                try {
                    TimeUnit.SECONDS.sleep(1L);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    }
    jobHandlerRepository.clear();

    // destroy JobLogFileCleanThread
    JobLogFileCleanThread.getInstance().toStop();

    // destroy TriggerCallbackThread
    TriggerCallbackThread.getInstance().toStop();
}

This loop ensures that all in‑flight jobs finish before the executor shuts down. The existing TriggerCallbackThread already flushes remaining results after being stopped, so no extra handling is needed.

Step 3 – Process Termination

Finally, trigger the JVM shutdown hook (e.g., kill -15 in deployment scripts) or use Spring Boot Actuator’s /actuator/shutdown endpoint to invoke the graceful shutdown logic.

Alibaba Cloud MSE XXL‑JOB Integration

Alibaba Cloud’s MSE XXL‑JOB instance service provides built‑in graceful shutdown without code changes. Users can purchase the MSE XXL‑JOB instance, connect their applications, and enable the shutdown mode via configuration: xxl.job.executor.shutdownMode=WAIT_ALL Supported shutdown modes:

WAIT_ALL (recommended): The application exits only after all received tasks and subtasks complete.

WAIT_RUNNING : The application waits for currently running tasks; queued tasks are discarded.

When deploying in Kubernetes, the pod’s terminationGracePeriodSeconds should be set according to the expected task duration to allow the graceful shutdown logic to run.

References

Official XXL‑JOB documentation: https://www.xuxueli.com/xxl-job/

Alibaba Cloud MSE XXL‑JOB quick start: https://help.aliyun.com/zh/mse/getting-started/get-started-with-xxl-job-in-10-minutes

Guide to enable graceful shutdown: https://help.aliyun.com/zh/mse/use-cases/how-to-enable-graceful-shutdown

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

task schedulingSpring BootXXL-JOBGraceful Shutdown
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.