Why kill -9 Is Wrong: A Graceful Shutdown Approach for Microservices
The article explains how to replace the blunt kill -9 command with a coordinated graceful shutdown process for Spring Cloud microservices, covering signal handling, Eureka cache nuances, actuator endpoints, Spring Boot 2.3 settings, custom shutdown endpoints, and strategies for Kafka, scheduled jobs, thread pools, and long‑running loops.
When a system adopts microservice architecture, each service becomes more focused, but shutting down a service gracefully becomes a management challenge. A naive approach like kill -9 abruptly terminates the JVM, preventing any cleanup logic and risking data inconsistency.
Prerequisite
Graceful shutdown requires at least two service instances; with a single instance, any shutdown will inevitably cause request failures. Multiple instances enable rolling updates where other instances continue serving traffic while one is being taken offline.
kill -9 vs kill (SIGTERM)
kill -9forcefully kills the JVM without sending a termination signal, so no shutdown hook runs. An unparameterized kill sends SIGTERM (equivalent to kill -15), allowing the application to execute cleanup code, e.g.:
Runtime.getRuntime().addShutdownHook(new Thread(() -> log.info("application shutdown")));When using Spring Cloud with Eureka, a SIGTERM triggers the service to deregister from Eureka before the process exits, while kill -9 leaves the instance marked as alive until Eureka’s lease expires.
Remaining Issues with SIGTERM
Eureka’s three‑level cache: the readOnlyCacheMap syncs only every 30 seconds, so other services may still see the instance as up.
Client‑side cache: after Eureka updates, the local client cache may still hold the old status.
/actuator/shutdown endpoint
Enabling the shutdown actuator endpoint:
management:
endpoint:
shutdown:
enabled: trueIt behaves like SIGTERM—deregistering from Eureka then stopping the process—so the cache problems above remain.
/actuator/pause endpoint
Pause deregisters the service but keeps the JVM running, allowing in‑flight requests to complete while new requests are blocked. Enable it with:
management:
endpoint:
pause:
enabled: true
restart:
enabled: truePause requires eureka.client.healthcheck to be disabled; otherwise the endpoint is ineffective.
/actuator/service‑registry endpoint
This endpoint can set the service status to DOWN without terminating the process, letting you wait for caches to expire before finally shutting down. Enable it:
management:
endpoint:
service-registry:
enabled: trueTo mark a service down:
curl --location --request POST 'localhost:8090/actuator/service-registry' \
--header 'Content-Type: application/json' \
--data-raw '{
"status": "DOWN"
}'Reverting to UP restores normal operation.
Spring Boot 2.3 graceful shutdown
Spring Boot 2.3 adds built‑in graceful shutdown support:
# Enable graceful shutdown (default is IMMEDIATE)
server.shutdown=graceful
# Grace period before forced termination
spring.lifecycle.timeout-per-shutdown-phase=90sWhen the shutdown actuator is invoked, the application waits for ongoing requests to finish and for the grace period to elapse before exiting.
Custom actuator endpoint
Define a custom endpoint to run bespoke logic:
@Endpoint(id = "myshutdown")
public class GracefulShutdownEndpoint {
@Autowired
private EurekaClient eurekaClient;
@WriteOperation
public void shutdown() {
eurekaClient.shutdown();
}
}This ties the shutdown directly to Eureka, which may limit portability if the registry changes.
Cache‑related waiting time
All the above approaches share a common cache delay: Eureka’s three‑level cache sync (30 s), client‑side cache sync (30 s), and Ribbon’s client cache (30 s). The worst‑case wait is 90 seconds, after which the instance is considered down. Adding a few extra seconds for in‑flight HTTP requests (e.g., 5 s) yields a total safe window of about 95 seconds before issuing the final kill.
Gateway considerations
External gateways (e.g., Nginx) do not respect Eureka status. When taking a gateway offline, you must manually remove the node from Nginx configuration before shutting it down.
Traffic impact
If only two instances exist, shutting one down doubles the load on the remaining instance. During peak traffic, add extra instances before proceeding.
Handling other workloads
Beyond HTTP, services may run Kafka consumers, scheduled jobs (e.g., XXL‑Job), thread pools, or long‑running loops. The article proposes a ShutdownRegistry that broadcasts a shutdown event to registered components.
Kafka consumer example (pause on shutdown):
@Component
public class ConsumerShutdown {
@Autowired
KafkaListenerEndpointRegistry registry;
public ConsumerShutdown() {
ShutdownRegistry.register(new Shutdown(se -> {
for (MessageListenerContainer container : registry.getListenerContainers()) {
if (!container.isPauseRequested()) {
log.info("consumers with topics: {} paused because of shutdown application", container.getContainerProperties().getTopics());
container.pause();
}
}
}));
}
}For scheduled tasks (XXL‑Job), an aspect intercepts the @XxlJob annotation and returns an error response if a shutdown flag is set:
@Aspect
@Component
public class XxlJobShutdown {
private static ConcurrentHashMap<String, Shutdown> CACHE = new ConcurrentHashMap<>();
@Around("@annotation(xxlJob)")
private ReturnT<String> before(ProceedingJoinPoint joinPoint, XxlJob xxlJob) throws Throwable {
CACHE.computeIfAbsent(xxlJob.value(), s -> new Shutdown(xxlJob.value()));
if (CACHE.get(xxlJob.value()).hasShutdown()) {
return new ReturnT<>(500, "application shutdown");
}
return (ReturnT<String>) joinPoint.proceed();
}
}Thread pools should use shutdown() to stop accepting new tasks while completing queued work, or shutdownNow() to abort immediately, noting that shutdownNow() throws InterruptedException for running threads.
Long‑running while loops can be wrapped in a LoopShutdown class that checks the shutdown flag each iteration:
public class LoopShutdown extends Shutdown {
public void loop(Supplier<Boolean> supplier) {
while (true) {
if (hasShutdown()) {
log.info("{} shutdown loop", name);
break;
}
if (supplier.get()) {
break;
}
}
unRegister();
}
}
new LoopShutdown().loop(() -> {
List<DbData> list = mapper.selectList(param);
if (CollectionUtils.isEmpty(list)) {
return true; // break
}
// process data
return false; // continue
});These patterns illustrate how to propagate a shutdown signal to various components, ensuring they stop accepting new work and finish current work before the JVM exits.
Overall, the article provides a step‑by‑step guide to achieve graceful microservice termination, highlights cache‑related pitfalls, and offers concrete code snippets for HTTP endpoints, custom shutdown logic, and auxiliary workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SpringMeng
Focused on software development, sharing source code and tutorials for various systems.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
