Operations 8 min read

How Dynamic Task‑Grabbing Cuts Distributed Batch Jobs from Hours to Minutes

This article presents a detailed case study of optimizing a distributed batch processing system by replacing static shard‑key concurrency with a dynamic task‑grabbing mechanism, dramatically reducing execution time from several hours to under fifteen minutes while maintaining stable resource usage.

FunTester
FunTester
FunTester
How Dynamic Task‑Grabbing Cuts Distributed Batch Jobs from Hours to Minutes

Background

A distributed batch system uses a distributed database architecture where the core functionality is implemented by batch programs. Performance testing must monitor server resource usage and detect anomalies, but it also needs to watch for data‑distribution imbalance that can cause some concurrent sub‑programs to run much longer, becoming a bottleneck for the whole batch job.

System Characteristics and Problem

The tested system stores business data from multiple branches of a company in a distributed database. It contains dozens of database shards and over 500 shard keys (the primary key that determines data placement). Each shard key groups data from several institutions. During batch execution, roughly 500 sub‑programs run concurrently, each handling the data belonging to a specific shard key. Because the amount of data per shard key varies widely, sub‑programs processing larger shards take significantly longer, slowing down the overall batch process.

Sharding and shard key relationship
Sharding and shard key relationship
Shard distribution illustration
Shard distribution illustration

First Optimization: Dynamic Task‑Grabbing

To eliminate the static‑concurrency bottleneck, the system adopts a "task‑grabbing" dynamic concurrency model. The approach consists of three steps:

All data to be processed are divided into tasks based on a chosen dimension (e.g., institution ID plus a specific parameter). Each task corresponds to the data of one institution for that parameter.

A task‑generation program creates a record for every task in a task table, marking each as "initialized" before any processing starts.

Worker sub‑programs no longer follow a fixed shard‑key schedule. Instead, each worker queries the task table for tasks that are still initialized and have a large data volume (high priority). After completing a task, the worker marks it as finished and fetches the next available task. This loop continues until no initialized tasks remain.

Task grabbing logic comparison
Task grabbing logic comparison

In a test scenario with 16 million rows, 300 concurrent sub‑programs processed 110 000 generated tasks. All sub‑programs finished within 33 minutes, and the total batch execution time was 32 minutes 21 seconds, with no individual sub‑program becoming a noticeable outlier.

Second Optimization: Controlling Task Size

Even after the first optimization, a few tasks still contained excessive data, and the sheer number of small tasks caused overhead when workers repeatedly grabbed and updated them. The second refinement adds two mechanisms:

A configurable parameter limits the maximum data size per task. When multiple small tasks together stay below this threshold, they are merged into a larger task; a task boundary is created when the shard key changes.

For originally large tasks, an additional dimension field is introduced to split them into smaller chunks, also respecting the maximum‑size parameter.

With the same data set and unchanged concurrency, the number of generated tasks dropped from over 100 000 to fewer than 1 000. Task generation now takes 2 minutes 40 seconds, while the data‑processing phase shortens to 12 minutes 33 seconds, yielding a clear overall performance gain.

Performance test results
Performance test results

Conclusion and Outlook

The two‑stage optimization reduced the production batch runtime from nearly four hours to roughly fifteen minutes—a reduction of over 90%—while keeping system and database resource utilization stable and free of bottlenecks. Future work will continue to explore distributed batch testing techniques, deeper system analysis, and further tuning methods to improve efficiency and reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsPerformance Optimizationshardingload balancingBatch Processingtask schedulingdynamic concurrency
FunTester
Written by

FunTester

10k followers, 1k articles | completely useless

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.