Understanding JVM OutOfMemoryError Types and the Fail‑Fast Principle
This article explains the common JVM OutOfMemoryError categories, analyzes a real‑world GC overhead limit exceeded incident, and demonstrates how applying the fail‑fast principle can help developers detect and resolve memory‑related failures more efficiently.
OOM Types
OutOfMemoryError inherits from VirtualMachineError, an asynchronous exception that can be thrown at any time during program execution.
1. java.lang.OutOfMemoryError: Java heap space
Before Java 8, the JVM split the heap into Heap Space and PermGen (controlled by -XX:MaxPermSize). Since Java 8, the permanent generation has been replaced by Metaspace (controlled by -XX:MaxMetaspaceSize). When the program tries to allocate more objects on the heap (using new), this error is triggered.
2. java.lang.OutOfMemoryError: PermGen space
Applicable to Java 7 and earlier, this occurs when the permanent generation runs out of space due to class metadata, constant pool entries, field and method data, etc.
3. java.lang.OutOfMemoryError: Metaspace
In Java 8 and later, this is thrown when the Metaspace cannot accommodate additional class metadata.
4. java.lang.OutOfMemoryError: GC overhead limit exceeded
The JVM aborts when garbage collection spends more than 98% of the time reclaiming less than 2% of heap memory. The flag -XX:UseGCOverheadLimit can disable this check. This is the focus of the case study.
5. java.lang.OutOfMemoryError: Unable to create new native thread
Occurs when the JVM cannot obtain a new OS thread.
6. java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Triggered when an array allocation exceeds Integer.MAX_VALUE.
7. Out of memory: Kill process or sacrifice child
The OS OOM killer may terminate the most memory‑hungry process when system memory is exhausted.
8. java.lang.OutOfMemoryError: Direct buffer memory
Thrown when native (off‑heap) memory allocation fails.
Encountered OOM: GC overhead limit exceeded
Problem Symptoms
All service interfaces exhibited response times several orders of magnitude higher than normal, appearing as if the monitoring tool reported microseconds instead of milliseconds.
Root Cause Analysis
Log inspection revealed many JDBC connection failures. Using netstat -nap | grep 3306 showed numerous connections in CLOSE_WAIT state, indicating the MySQL server closed them.
GC logs demonstrated prolonged Full GC cycles with very low efficiency in the old generation, e.g.:
2019-04-04T11:52:39.358-0800: 196.040: [Full GC (Ergonomics) [PSYoungGen: 47104K->44664K(67072K)] [ParOldGen: 174711K->174710K(175104K)] 221815K->219375K(242176K), [Metaspace: 55322K->55322K(1099776K)], 0.2207768 secs] [Times: user=0.77 sys=0.01, real=0.22 secs]The JVM spent most of its time in Stop‑The‑World Full GC, causing MySQL handshake timeouts and resulting in connection failures.
Heap dump analysis showed a massive number of unreclaimed String objects because an API allowed downstream systems to submit an unbounded list of IDs, leading to excessive memory consumption.
Fail‑Fast Principle
If an error occurs, fail immediately and visibly. When something unexpected happens, let the software fail fast instead of postponing the failure or working around it.
Applying fail‑fast to the OOM case means letting the JVM abort with GC overhead limit exceeded rather than continuing to run in a degraded state. This makes the underlying bug (excessive Full GC caused by unbounded input) visible early, reducing investigation time.
Benefits of Fail‑Fast
Bugs are discovered and reproduced more quickly, lowering resolution cost.
Faster delivery of stable software versions.
Early detection during testing reduces the number of bugs that reach production.
When failures are caught early, downstream services are less likely to suffer data loss or inconsistency.
Other Examples
Missing Startup Parameters
If a required configuration is absent, the program should throw a clear exception rather than falling back to defaults that may hide the problem.
Invalid Client Parameters
When a client sends malformed input, the service should reject it immediately instead of attempting heuristic corrections that could introduce subtle bugs.
Conclusion
Analyzing this OOM incident highlights the importance of the fail‑fast principle: let the program abort when it encounters conditions it cannot handle, making bugs visible sooner and improving overall system robustness.
References
[1] "The Fail‑Fast Principle in Software Development" https://dzone.com/articles/fail-fast-principle-in-software-development
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
