How a GC Alert Led Me to Split and Shard Scheduled Jobs for Better Performance

After receiving a Young Generation GC alarm on a Pinduoduo service, I traced the issue to a high‑frequency scheduled task that created massive objects, then resolved it by breaking the job into finer‑grained tasks and finally sharding the work across multiple machines.

dbaplus Community
dbaplus Community
dbaplus Community
How a GC Alert Led Me to Split and Shard Scheduled Jobs for Better Performance

Problem Detection

One night I received a WeChat Enterprise Mail alert indicating that the Young Generation (G1) GC count exceeded the threshold. The alarm suggested excessive object creation and rapid reclamation, prompting me to investigate the next day at the office.

Root Cause Analysis

Monitoring data from the CAT platform showed two peak periods, likely caused by uneven scheduling that concentrated load on a single machine. By reviewing logs around the alarm times and searching for Command entries, I identified the offending scheduled job: a task that synchronizes advertising transaction data from Toutiao.

The task pulls daily, yesterday’s, the day before’s, and the day before that’s data, repackages it, and reports it to downstream platforms. Although the code limits each batch to 1,000 records, the sheer volume of transaction data still forces the creation and destruction of a large number of objects, triggering the GC alarm.

Task Splitting

My first mitigation was to split the original job into three separate tasks, each handling a specific day’s data. This reduced the load per machine because the scheduler could distribute the three tasks across different nodes.

Task splitting diagram
Task splitting diagram

Although splitting lowered the per‑machine pressure, the GC alarm re‑occurred, indicating that further optimization was needed.

Task Sharding

Instead of assigning whole tasks to individual machines, I introduced sharding: the job is divided into a fixed number of slices (e.g., 10). Each slice is processed by any available machine based on the data’s identifier modulo the slice count.

For example, with two machines A and B and ten slices, A might handle slices [0‑4] and B slices [5‑9]. When processing 14,267 records, each record’s id % 10 determines which slice—and thus which machine—processes it.

Task sharding diagram
Task sharding diagram

This approach spreads the workload evenly and avoids the single‑machine bottleneck, provided the task can be safely partitioned. Tasks that require aggregated processing across accounts, for instance, may not be suitable for sharding.

Takeaways

The monitoring platform gives clear visibility into micro‑service health, and alerts are promptly routed to owners.

A unified scheduling platform prevents scattered @Scheduled annotations and makes task management transparent.

Comprehensive documentation and internal wikis improve knowledge sharing and operational efficiency.

Technical communities (e.g., the Gavin scheduling group) offer valuable support for platform‑specific challenges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javashardingtask schedulinggcbackend optimization
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.