Why Did My Java Service Hit 100% CPU? A Deep Dive into a BouncyCastle Memory Leak

The article walks through a real‑world Java production incident where CPU spiked to 100%, detailing systematic troubleshooting steps, heap analysis with MAT, and the discovery that repeatedly creating BouncyCastleProvider objects caused a memory leak that was fixed by refactoring the code.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Why Did My Java Service Hit 100% CPU? A Deep Dive into a BouncyCastle Memory Leak

1. Problem Discovery

The CPU usage of the online machines rose steadily from April 8th, eventually reaching 100% and making the service unavailable; a restart temporarily restored it.

2. Investigation Approach

The possible causes were divided into five directions:

System code issues

Downstream system problems causing avalanche effect

Sudden surge in upstream calls

Third‑party HTTP request problems

Machine‑level issues

3. Investigation Steps

1. Checked logs – no concentrated errors, so code logic errors were initially ruled out.

2. Contacted downstream systems; their monitoring was normal, eliminating downstream impact.

3. Compared provider interface call volume over seven days – no spike, ruling out business‑side call volume.

4. Inspected TCP status – normal, so third‑party HTTP timeout was excluded.

5. Monitored six machines; all showed rising CPU, indicating no single machine fault.

These steps did not directly locate the root cause.

4. Solution

1. Restarted the five most affected machines to restore service, keeping one machine for analysis.

2. Checked the Tomcat thread PID.

3. Examined system usage of the PID with top -Hp 384.

4. Found threads 4430‑4433 each consuming about 40% CPU.

5. Converted those PIDs to hexadecimal: 114e, 114f, 1150, 1151.

6. Dumped the Java thread stack: sudo -u tomcat jstack -l 384 > /1.txt.

7. Identified that the high‑CPU threads were GC threads.

8. Dumped the Java heap:

sudo -u tomcat jmap -dump:live,format=b,file=/dump201612271310.dat 384

.

9. Loaded the heap with MAT and discovered that a javax.crypto.JceSecurity object occupied 95% of memory, pinpointing the issue.

MAT download: http://www.eclipse.org/mat/

10. Examined the reference tree and saw that the BouncyCastleProvider object was held excessively, indicating misuse in the code.

5. Code Analysis

The problematic code creates a new BouncyCastleProvider for each encryption/decryption operation and passes it to Cipher.getInstance().

Tracing Cipher.getInstance() leads to the JDK’s JceSecurity implementation, where verificationProviders repeatedly put and remove, while verificationResults only put into a static map.

The static verificationResults map belongs to JceSecurity, so each encryption adds an entry that never gets garbage‑collected, causing the memory leak.

6. Code Improvement

Make the problematic object static so each class holds a single instance, preventing repeated creation.

7. Summary

When encountering an online issue, follow a systematic investigation:

Check logs.

Check CPU usage.

Check TCP status.

Inspect Java threads with jstack.

Inspect Java heap with jmap.

Analyze the heap with MAT to find non‑collectable objects.

Source: https://www.cnblogs.com/kingszelda/p/9034191.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory leaktroubleshootingBouncyCastleCPU usagejstackjmap
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.