Why Does My Java Service Time Out? Uncovering Hidden GC and Connection‑Pool Pitfalls
This article walks through a real‑world production‑environment debugging case, showing how tracing with SkyWalking, deep GC log analysis, mis‑configured c3p0 pools and Finalizer behavior can combine to cause intermittent time‑outs and high latency in a distributed Java service.
Prologue
Production‑environment tuning is often complex and painful, especially for internet companies where engineers inherit tangled code, numerous patches, and tangled service dependencies.
Technologies involved: hystrix, link tracing, distributed service log search, http, gc, thread pool, c3p0 connection pool, jdbc driver, netty nio, cache, finalizer, Memory Analysis Tool, jvisualvm, yourkit.
The author does not aim to provide a complete solution but rather to guide readers in developing their own troubleshooting mindset through a Q&A style narrative.
Chapter 1 – From an Alert
An occasional timeout alert on a complex call chain prompted investigation. After deploying SkyWalking (an Apache link‑tracing tool), the visual UI revealed that interface a timed out because it called interface b .
SkyWalking is an Apache link‑tracing tool.
Since b did not print the traceId and lacked ELK log collection, the author traced the request via HTTP headers, discovering that the traceId is base64‑encoded in the header.
To search logs on b without ELK, the author wrote a multithreaded Python script that SSHes into each machine and streams stdout to the console.
Insight: Logging is one of the most important techniques; a good logging system can prevent endless debugging.
Chapter 2 – The Problem Surfaces
The request originates from Apache HttpClient; the server side uses a custom Netty‑based framework. The delay between request submission and execution is caused by thread‑pool queuing in Hystrix or HttpClient when the pool is exhausted.
Key questions explored:
Is the Hystrix task queued in the thread pool?
Is the HttpClient request queued?
Does Netty’s event‑loop experience queuing? (workerGroup size = 2*CPU‑cores‑1 = 95, making queuing unlikely.)
Without sufficient data, this line of investigation was abandoned.
Deeper thought: checking SkyWalking source could confirm start‑time recording for Hystrix and HttpClient.
Using grep "xxxx-xx-xx xx:xx:xx" full.log > simple.log to extract nearby logs revealed a ~4 s gap with no logs, matching the timeout. This suggested a stop‑the‑world GC pause.
1074370.765: [GC (CMS Final Remark) [YG occupancy: 1161721 K (2831168 K)]
1074370.765: [Rescan (parallel) , 0.0282716 secs]
1074370.793: [weak refs processing, 3.9080203 secs]
1074374.701: [class unloading, 0.0895909 secs]
1074374.791: [scrub symbol table, 0.0193149 secs]
1074374.810: [scrub string table, 0.0073974 secs][1 CMS-remark: 4456982K(5242880K)] 5618704K(8074048K), 4.0599904 secs]
[Times: user=3.06 sys=1.86, real=4.06 secs]Insight: The most time‑consuming phase is weak refs processing , which includes SoftReference, WeakReference, FinalReference, PhantomReference, and JNI Weak Reference. The longest sub‑phase is FinalReference.
Chapter 3 – GC Details
GC flags used: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC. The log timestamps are offsets from JVM start, making analysis hard; adding -XX:+PrintGCDateStamps prints real dates.
Weak refs processing consumes significant time, hinting that a cache (local or Redis) may be promoting objects to the old generation.
Chapter 4 – Finding the Culprit
Using jvisualvm and MAT , the author discovered that Finalizer objects dominate retained heap size. The top three retained classes were:
java.net.SocksSocketImpl com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl com.mysql.jdbc.ConnectionFurther investigation showed that the MySQL JDBC driver (v5.0.5) still uses Finalizer, causing memory pressure.
Chapter 5 – Connection‑Pool Misconfiguration
The service uses c3p0 with setMaxIdleTime(60). The author initially thought this would discard idle connections only when exceeding minPoolSize, but the actual rule discards any idle connection after 60 s, regardless of pool size. When a connection is discarded and the pool size falls below minPoolSize, c3p0 immediately recreates it, causing a surge of connections.
Comparative test with HikariCP (max pool size 100) showed stable connection count, while c3p0’s count spiked to ~99 extra connections after 10 s.
Key c3p0 parameters: maxIdleTime – seconds a connection may stay idle before being discarded (default 0 = never). maxIdleTimeExcessConnections – seconds idle time allowed for connections beyond minPoolSize (default 0 = no enforcement).
Thus, to discard only excess idle connections, maxIdleTimeExcessConnections should be used.
Chapter 6 – Deep Dive
Increasing -XX:MaxTenuringThreshold to 15 reduced the frequency of old‑generation promotions, smoothing GC pauses. However, a larger threshold may cause Survivor space overflow.
Connection lifecycle in the JVM:
After creation, age = 0.
Before the 6th young GC (age 5), idle time may exceed 60 s, prompting pool to release the connection.
If released, the 7th GC (age 7) promotes the object to the old generation before it can be reclaimed.
This explains why raising the tenuring threshold lowered connection‑related GC pressure.
Chapter 7 – Lessons Learned
When facing similar GC issues, remember:
Use proper tools (SkyWalking, jvisualvm, MAT, YourKit).
Understand object lifecycles and GC phases.
Beware of mis‑configured connection‑pool parameters that can cause hidden connection churn.
Adjust GC flags and tenuring thresholds thoughtfully.
Future work includes exploring G1GC handling of long‑lived objects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Smart Platform Tech Team
The Sohu News app's technical sharing hub, offering deep tech analyses, the latest industry news, and fun developer anecdotes. Follow us to discover the team's daily joys.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
