Why Did My Java Service Hit 90% Memory? Uncovering Hidden NioChannel Leaks
An in‑depth investigation of a Java service’s memory alarm reveals that a surge of temporary NioChannel objects, caused by high QPS and insufficient socket reuse, prematurely promotes objects to the old generation, leading to uncollected memory growth, and the article details the diagnosis, GC tuning, and mitigation steps.
Background
Memory usage alerts usually stem from high memory or CPU usage; this article walks through a real‑world memory alarm investigation to help newcomers with troubleshooting.
Core JVM Parameters
-Xms4915m -Xmx4915m -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=40 -XX:-G1UseAdaptiveIHOP -XX:G1HeapRegionSize=16m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -XX:MaxDirectMemorySize=1g -XX:SurvivorRatio=10 -XX:+ExplicitGCInvokesConcurrent -XX:MarkStackSize=4194304The heap is 5 GB, each G1 region 16 MB, metaspace 512 MB, and the old generation can reach >8 GB when threads are ~1000.
Monitoring
At 10:10 a memory‑usage‑>90 % alarm fired while CPU and network were stable; the G1 “Old” region usage rose sharply (red line in the chart).
Emergency Handling
First stop‑the‑bleed: dump the heap, then trigger a Full GC. After Full GC the old generation shrank noticeably.
Problem Analysis
Why did many objects end up in the old generation and why weren’t they reclaimed?
Long‑lived objects : survived many Young GCs.
Large objects (Humongous) : size ≥½ region.
Early promotion : survivor space insufficient, causing “pre‑old” promotion.
The dump showed a massive Unreachable object report, mainly NioChannel instances created by Tomcat’s org.apache.tomcat.util.net.NioEndpoint when the channel pool (default size 500) was exhausted.
Lifecycle Investigation
Objects survived >5 min (≈15 Young GCs) before entering the old generation; NioChannel wraps a SocketChannel, and its lifetime matches the request handling time.
Large‑Object Check
The monitored request body was ~217 KB, far below the Humongous threshold, so the objects were not large.
Early Promotion Check
When survivor size is too small, objects are promoted early. Adding -XX:+G1UseAdaptiveIHOP or lowering -XX:InitiatingHeapOccupancyPercent can trigger Mixed GC earlier.
-XX:+G1UseAdaptiveIHOP
-XX:InitiatingHeapOccupancyPercent=40In the test environment the old generation never reached the IHOP threshold (≈2 GB), so Mixed GC never ran.
Adjustments
To force Mixed GC we reduced the heap to 4 GB, set -XX:InitiatingHeapOccupancyPercent=20, and observed Mixed GC reclaim ~500 MB of old space.
Solution 1 – Increase NioChannel Pool
Configure Tomcat’s socket pool to 1024 channels:
@Configuration
public class TomcatPropertiesConfig implements WebServerFactoryCustomizer<TomcatServletWebServerFactory> {
@Override
public void customize(TomcatServletWebServerFactory factory) {
factory.addConnectorCustomizers(connector -> {
ProtocolHandler handler = connector.getProtocolHandler();
if (handler instanceof AbstractProtocol) {
connector.setAttribute("socket.bufferPool", 1024);
}
});
}
}After the change the NioChannel count stabilized even during QPS spikes.
Solution 2 – Traffic Smoothing
Instead of scaling the cluster, the SDK was modified to stagger data uploads by client IP, adding up to 30 s delay, effectively spreading the peak ten‑fold.
Conclusion
The 90 % memory alarm was caused by a burst of HTTP requests that exhausted Tomcat’s default 500‑channel pool, creating many temporary NioChannel objects that were prematurely promoted to the old generation. Insufficient survivor space prevented timely reclamation, and the heap size prevented Mixed GC from triggering. Increasing the channel pool, tuning G1 parameters, and smoothing traffic eliminated the issue.
Cognitive Technology Team
Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
