Operations 25 min read

Investigation of Java Service Crashes at Midnight Due to Cron and Open Files Limit in CentOS Containers

The article analyzes why a Java service repeatedly crashes around midnight in test environments, tracing the issue through system limits, Java version checks, cron job execution, strace logs, and Linux OOM killer behavior, and finally proposes configuration and version upgrades to prevent the failures.

JD Retail Technology

Oct 27, 2023

Investigation of Java Service Crashes at Midnight Due to Cron and Open Files Limit in CentOS Containers

Users reported that a Java service in the test environment consistently terminated around 00:00 despite having no scheduled tasks, no traffic spikes, and reasonable JVM settings. The investigation began by reproducing the problem with a minimal Spring Boot "hello world" WAR deployed on a Tomcat base image (base_tomcat/java-centos6-jdk18-60-tom8050-ngx197, Java 1.8.0_60).

Initial suspicion fell on Linux limit settings. The ulimit -n (open files) values of the failing containers were examined and found to be unusually high.

Testing the limit hypothesis prlimit -p 32672 --nofile=1048576 Even after adjusting the limit to match a healthy machine, the Java process still died at midnight, indicating that open files limits were not the direct cause.

Java version check

Consulting the JDOS R&D team suggested that the low Java version might allow excessive memory allocation. A reference article ( Docker support in new Java 8 ) explained that Docker cgroup memory limits could trigger JVM termination, and newer Java versions mitigate this.

An experiment using Java 11.0.8 showed the same crash behavior, ruling out the Java version as the root cause.

Cron job investigation

Since the base image includes system cron tasks, the team inspected /etc/crontab and identified a logrotate.sh script scheduled at the same time as the crashes. Modifying the cron schedule to 11:00 and capturing a strace trace confirmed that the Java process terminated when the cron job ran.

19:59:01 close(3) = 0
19:59:01 stat("/etc/pam.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
... (truncated for brevity) ...
19:59:06 +++ killed by SIGKILL +++

The trace showed a massive mmap allocation of 4 GB just before the process was killed, indicating an OOM situation triggered by the cron task.

Understanding the OOM killer

The Linux OOM killer selects a process to terminate based on memory usage, OOM score, priority, and other attributes. In this case, the cron child process caused a rapid memory spike that exceeded container limits, leading the kernel to kill both the cron child and the Java process.

Later versions of the cronie package (≥ 1.5.7‑5) fix the bug where the cron daemon clears memory according to the open‑files limit before sending mail, preventing the excessive allocation.

Solution

Upgrade the base image to a newer, stable CentOS version (e.g., 6.10 or 7.9) where the issue does not occur.

Set a reasonable limit open files value for containers.

For application_worker type services, adjust the limit in the startup script; for web_tomcat services, consider removing or disabling the problematic cron task.

Upgrade cronie to version 1.5.7‑5 or later (check with rpm -q cronie).

Finally, the article emphasizes that improper open files limits combined with cron tasks can cause severe memory OOM events, and recommends verifying container OS versions and limit settings before deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java container troubleshooting cron CentOS Linux OOM Killer open files limit

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.