Postmortem of a Server Crash Caused by a Mis‑managed Scheduled Task in a Backend Module
The article analyzes a server outage triggered by a module that repeatedly created a scheduled task without proper lifecycle control, examines the problematic Java code, lists four key issues, presents a corrected implementation, and reflects on development, testing, review, and logging practices to prevent similar incidents.
Today the author received an urgent call from the monitoring platform: a module had crashed the server. The interface was hit more than six million times in an hour, causing data distortion and platform paralysis.
Investigation
The problem was easy to locate because only the mobile client called the faulty interface. The original code created a ScheduledExecutorService that started a fixed‑rate task every five minutes without any guard against multiple starts or proper shutdown.
public class Module {
private static final ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
// Module start
public void moduleStart() {
// Schedule task, delay 0 minutes, repeat every 5 minutes
executor.scheduleAtFixedRate(() -> {
// Execute request in child thread
request(new Callback(){
@Override
public void onResponse(Data data) {
if (data == null) {
Log.w("data is null");
return;
}
// ...
}
});
}, 0, 5, TimeUnit.MINUTES);
}
public void moduleStop() {
// No shutdown logic here
}
}Four questions were raised:
Will multiple module starts create multiple scheduled tasks?
Why request every five minutes instead of pushing a data‑update notification?
When does the scheduled task end?
When is the thread pool shut down?
Solution
The revised implementation adds a ScheduledFuture guard, ensures the task is created only once, and shuts down both the task and the executor when the module stops.
public class Module {
private final ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
private ScheduledFuture
scheduledFuture;
public void moduleStart() {
// Create task only if not already created
if (scheduledFuture == null) {
scheduledFuture = executor.scheduleAtFixedRate(() -> {
request(new Callback(){
@Override
public void onResponse(Data data) {
if (data == null) {
Log.w("data is null");
return;
}
// ...
}
});
}, 0, 5, TimeUnit.MINUTES);
}
}
public void moduleStop() {
// Cancel task if running
if (scheduledFuture != null) {
scheduledFuture.cancel(true);
scheduledFuture = null;
}
// Shut down executor
executor.shutdown();
}
}Retrospective
Development
The developers missed several basic considerations, making them the primary responsible party.
Testing
1. Why wasn’t the issue caught? The lack of proper test coverage and reliance on manual checks meant the bug slipped through.
2. Load testing could have revealed the abnormal request volume, but it was not performed in a synchronized multi‑device manner.
Review
Code reviews were superficial; the team prioritized form over substance.
Logging
When the interface returned null, a warning log was printed. A careful review of logs before release could have prevented the incident.
Logs are essential for locating problems in testing and production; developers should monitor them continuously, not just when they expect failures.
By studying this case, teams are encouraged to use logs correctly, verify test feedback thoroughly, and search for critical log statements in new features to avoid future outages.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.