Why Did My Java Service Crash? Uncovering a BouncyCastle Memory Leak and Fixing It
The article walks through a real‑world Java service outage caused by CPU saturation, details a systematic five‑step investigation, reveals a memory leak in BouncyCastleProvider objects within JceSecurity, and explains how converting the provider to a static singleton resolved the issue.
1. Problem Discovery
CPU usage of online machines rose from April 8, eventually reaching 100%, causing the service to become unavailable. After a restart the service recovered.
2. Investigation Approach
Possible causes were divided into five categories: system code issues, downstream cascade effects, upstream traffic spikes, third‑party HTTP problems, and host problems.
3. Investigation Steps
Checked logs – no concentrated errors, so code logic was ruled out.
Contacted downstream systems – they were normal.
Compared provider call volume – no spike.
Checked TCP status – normal, ruling out third‑party timeouts.
Monitored six machines – all showed rising CPU, eliminating host failure.
None of these pinpointed the root cause.
4. Solution
Restarted five of the six affected machines to restore service, keeping one for analysis.
Identified the Tomcat process PID (e.g., 384) and inspected its threads.
Found several threads (pid 4430‑4433) each consuming ~40% CPU.
Converted those PIDs to hex (114e‑1151) and dumped the Java stack: sudo -u tomcat jstack -l 384 > /1.txt Discovered that the heavy threads were GC threads.
Dumped the heap:
sudo -u tomcat jmap -dump:live,format=b,file=/dump201612271310.dat 384Analyzed the heap with Eclipse MAT and saw javax.crypto.JceSecurity objects occupying 95% of memory.
Examined the reference tree and found an excessive number of BouncyCastleProvider instances.
5. Code Analysis
The application creates a new BouncyCastleProvider for every encryption/decryption call and stores it in the static map inside JceSecurity. Because the map is static, the objects are never garbage‑collected, leading to a memory leak.
6. Code Fix
Make the provider a static singleton so each class holds a single instance, preventing repeated allocations.
7. Takeaways
When facing online incidents, follow a systematic checklist: check logs, CPU, TCP, Java threads (jstack), Java heap (jmap), and use a heap analyzer to locate non‑collectable objects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
