Investigating a Java Heap Memory Issue: A Detective Story of Debugging and Root Cause Analysis
This article narrates a step‑by‑step investigation of a Java server whose heap memory exceeded 95%, detailing log collection, tool usage, thread analysis, and ultimately uncovering a thread‑unsafe SimpleDateFormat that caused an infinite loop and massive memory consumption.
A private detective named Xiao Guang receives an urgent call about a Java service whose heap memory usage has surged above 95%, rendering the server unresponsive.
He first hypothesizes possible causes—massive memory allocation without effective GC or a bug causing a loop—and proposes locating the problem via a heap dump analyzed with MAT or JVisualVM, or, if that fails, using jmap and jstack utilities.
On‑site, attempts to dump the heap are blocked by size, JMX connection fails, and SSH commands such as ps -ef | grep java and jmap -histo <pid> | head -20 return attach errors, likely due to insufficient permissions.
Comprehensive logs are gathered: application logs, GC logs, monitoring logs, dkimi agent logs, and matrix charts. Analysis reveals a missing 20‑second window in the application logs, which aligns with a long GC pause (Stop‑The‑World), explaining the log gap.
WebFilter logs show no unusually long requests, so the investigation shifts to structured log comparison across threads. The only thread that disappears after the incident is catalina‑exec‑19 , marking it as the prime suspect.
Further code inspection uncovers a while (true) loop in getBetweenDates whose termination condition compares a full datetime string with a date‑only string using .equals() . This mismatch guarantees the loop never ends, continuously allocating objects and inflating the heap.
The root cause is traced to DateUtil.formatDate , which uses a static SimpleDateFormat DEFAULT_DATE_FORMAT . Because SimpleDateFormat is not thread‑safe, concurrent formatting can produce a start value larger than end , triggering the infinite loop.
A multithreaded reproduction test—500 threads sharing the same formatter while formatting different dates—confirms the bug and shows why the JVM does not throw an OutOfMemoryError: frequent GC pauses keep freeing just enough memory to avoid a fatal OOM.
The article concludes with recommendations: replace the static SimpleDateFormat with a thread‑safe alternative (e.g., DateTimeFormatter ), add strict parameter validation, and consider logging both before and after request handling to detect missing logs in future incidents.
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.