Why Did My Java Service Get Killed? Uncovering OOM and Thread‑Pool Pitfalls
A production Java service was unexpectedly killed due to out‑of‑memory conditions caused by oversized JVM heap settings and a thread‑pool that retained large log objects, and the article walks through the investigation, temporary fixes, deeper profiling, and the final remediation steps.
1. Observation
During production, Zabbix alerts indicated that the application had shut down. Logging into the bastion host revealed that the container had no process for the business application, and no abnormal logs were found in the container.
2. Problem
Why was the application process missing and no logs generated? If the system killed the process, what conditions triggered it? Why did the test and UAT environments never encounter this issue?
3. Investigation
First, the server’s total memory was checked – it has 8 GB.
The container’s environment variables show the JVM max memory is set to 4 GB. Another application on the same machine has the same setting, and a local queue (activitymq) is limited to 1 GB. When the application requests additional memory, the kernel runs out of low memory and kills the process.
System logs confirm the OOM kill:
The kernel log shows “Out of memory: Kill process … (java) …”. Low‑memory tracking explains that when low memory is exhausted and a new allocation is needed, the kernel selects a user process to kill.
Another machine with similar configuration shows total memory 7967 MB and low‑memory usage of 7832 MB, confirming low‑memory exhaustion.
Thus, unreasonable JVM max‑heap settings (4 GB each for two Java apps on an 8 GB machine) caused the OOM kill.
4. Temporary Fix
Reduce the JVM max‑heap setting and restart the application.
5. Further Resolution
After restart, memory usage was examined. Profiling with jmap and JProfiler revealed large char[] objects (up to 30 MB) and many String objects, originating from task‑center client logging.
Analysis showed the task‑center client uses a thread pool with a default maximum of 100 threads. Threads are reused and retain large log buffers, especially when logging big objects such as orders or inventory.
Four differences explain why test/UAT environments did not see the issue: smaller memory, less data, fewer jobs, and more frequent restarts.
System memory size and VM version.
Difference in order or inventory data volume.
Test and UAT environments run fewer jobs and finish quickly.
Application restart frequency is higher in test environments.
6. Final Solution
Set the task‑center client thread‑pool maximum threads to a realistic value (e.g., 10).
Remove logging of large objects.
Provide feedback to the task‑center team to adjust default thread‑pool settings.
7. Summary
When low memory is exhausted, the kernel kills user processes that do not affect system stability.
JVM heap and thread‑pool sizes must be tuned to the host’s memory capacity.
Large‑object logging in long‑lived thread‑pool threads can quickly consume memory.
Team guidelines should forbid logging of large objects in task processors.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
