Handling GC Alerts by Splitting and Sharding Scheduled Tasks in Production
The article recounts a production incident where a GC alert triggered due to excessive object creation in a scheduled ad‑transaction sync task, and explains how the problem was diagnosed, mitigated by task splitting, and finally resolved through data sharding across multiple machines.
Last week, after work, the author received a GC alert indicating that the Young Generation count exceeded the threshold, suggesting excessive object creation and rapid reclamation in a service.
Investigation showed the alert originated from a scheduled task that synchronizes advertisement transaction data from a third‑party platform; the task processes large volumes of data and repeatedly creates objects during re‑packaging.
Using the CAT monitoring platform and command‑level logs, the specific task was pinpointed, revealing that the high object churn caused the GC pressure.
Initial mitigation involved splitting the original task into three finer‑grained tasks (e.g., handling yesterday, the day before, and two days before separately), hoping to distribute load more evenly across machines.
When splitting alone proved insufficient, the author applied a sharding strategy: the task was divided into multiple slices, each slice assigned to a machine based on a modulo of the data ID, allowing all machines to process portions of the dataset concurrently without strict ordering constraints.
The sharding approach works well for tasks that can be processed independently, but tasks requiring holistic aggregation may not be suitable.
In conclusion, the incident highlighted the importance of robust monitoring, a unified scheduling platform, comprehensive documentation, and collaborative technical support within the organization.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.