Backend Development 13 min read

How to Optimize Java Scheduled Tasks for High‑Volume Data Sync

This article explains how to optimize Java scheduled tasks for large‑scale data synchronization by introducing phased improvements such as blacklist mechanisms, multithreaded execution with CountDownLatch, and server‑level sharding, providing practical guidelines for performance and reliability.

Java Interview Crash Guide

May 19, 2021

How to Optimize Java Scheduled Tasks for High‑Volume Data Sync

Introduction

Scheduled tasks are common in systems for periodic data processing or operations such as closing orders or backups. They fall into two categories: fixed‑time execution (e.g., every minute or daily) and delayed execution after an event. This article focuses on optimizing the first type, especially for data synchronization and transfer.

Phase 1: Basic Design

In the initial stage the logic is simple. Data transfer and data synchronization have different requirements regarding loss tolerance, so they are analyzed separately.

Data Transfer Tasks

Transfer tasks can tolerate loss, typical scenarios include pushing API credentials for security or retrying message pushes after failures. The design is straightforward: schedule the push, limit retry attempts, update status on success, increment failure count on error, and stop after the maximum retries.

Data Synchronization Tasks

Synchronization must guarantee successful delivery, such as order or member information sync between systems. The basic approach is to schedule synchronization, mark records as "failed" on error, and repeatedly query records with "unsynced" or "failed" status.

Phase 2: Blacklist Mechanism

When many systems participate, network issues can block others. A two‑level blacklist is introduced.

Two‑Level Blacklist

Level 1 prevents further requests to a target server during the current task run after repeated failures. Level 2 blocks retries for a longer period by marking records as permanently failed until manual intervention.

Implementation details: on connection failure, retry twice; if both attempts fail, mark the record as failed and add it to Level 1 blacklist stored in Redis (using a sorted set with timestamps). If a server hits the Level 1 threshold (e.g., 3 failures within 5 minutes), promote the entry to Level 2 and clear Level 1 data for that server.

Phase 3: Multithreading

As data volume grows, single‑threaded execution becomes a bottleneck. Using a thread pool sized to the number of CPU cores (or cores × 2) balances performance and overhead.

Thread Pool Design

Data is partitioned among threads. For example, with 8 threads, create 8 collections, sort records by target server identifier, and assign each record to a collection using hash(serverId) % 8 or round‑robin.

Preventing Task Overlap

When the main scheduler thread submits tasks to the pool, it returns immediately, risking overlapping executions. A latch (e.g., CountDownLatch) ensures the scheduler waits until all worker threads finish before completing the run. CountDownLatch latch = new CountDownLatch(8); Each worker calls latch.countDown(); after processing, and the scheduler calls latch.await(); to block until all threads finish.

Phase 4: Sharding Across Servers

With massive data and many target services, a single server cannot handle all requests efficiently. Sharding distributes the load across multiple servers.

Sharding by Table vs. By ID

Table‑based sharding is simple but limited in scalability and cannot balance hot data (e.g., orders). ID‑modulo sharding is theoretically unlimited: each server gets a unique server‑id, and queries include id % serverCount = serverId. This may affect index usage but ensures even distribution.

Alternatively, hash the target service identifier and use hash(service) % serverCount = serverId to group services per server, though uneven service loads can cause imbalance.

Additional Optimizations

Beyond performance, monitoring can trigger alerts when a server repeatedly enters the Level 1 blacklist or when response times exceed thresholds, allowing operators to investigate network or code issues and optionally apply weight adjustments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance optimization concurrency Sharding multithreading scheduled tasks

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.