Production Environment Optimization: Deep Dive into GC, Tracing, and Connection‑Pool Issues
This article walks through a real‑world production incident involving intermittent interface timeouts, demonstrates how tracing with SkyWalking and log analysis revealed a downstream service problem, explores GC log diagnostics, uncovers misconfigured c3p0 connection‑pool settings, and shares practical lessons for Java backend performance tuning.
Production‑environment tuning in large internet companies is often painful due to high staff turnover, tangled service dependencies, and distributed deployments. The author encountered an intermittent timeout on interface a and used Apache SkyWalking to visualize the call chain, discovering that the timeout originated from downstream service b .
SkyWalking showed that the trace ID was passed via HTTP headers (base64‑encoded). By printing request headers on the b side, the missing trace ID was confirmed, illustrating how tracing can pinpoint routing issues without exhaustive log searches.
To investigate further, the author examined GC logs generated by UseParNewGC + UseConcMarkSweepGC . The logs revealed a long weak refs processing phase, especially the FinalReference step, which consumed hundreds of milliseconds. Adding the JVM flag -XX:+ParallelRefProcEnabled reduced the pause time dramatically.
Sample GC log excerpt:
1074370.765: [GC (CMS Final Remark) [YG occupancy: 1161721 K (2831168 K)]
1074370.765: [Rescan (parallel) , 0.0282716 secs]
1074370.793: [weak refs processing, 3.9080203 secs]
...The investigation also uncovered a misconfiguration in the c3p0 connection pool. The author had set maxIdleTime to 60 seconds, mistakenly believing it would only affect connections above minPoolSize . In reality, idle connections were being discarded and immediately recreated, causing a surge from ~2 000 to >17 000 connections. A comparison with HikariCP showed stable pool size when correctly configured.
Relevant c3p0 configuration snippet:
dataSource.setMinPoolSize(100);
dataSource.setMaxPoolSize(100);
dataSource.setInitialPoolSize(100);
dataSource.setMaxIdleTime(10);Key JVM flags used for deeper GC insight:
-XX:+PrintReferenceGC
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTenuringDistribution
-XX:MaxTenuringThreshold=15Further analysis with tools such as jvisualvm , Eclipse MAT, and YourKit helped identify that Finalizer objects (subclass of FinalReference ) were the main memory retainers, confirming that finalization can delay object reclamation.
Lessons learned include the importance of using proper observability tools, understanding GC phases and connection‑pool semantics, and systematically narrowing down performance bottlenecks in complex distributed Java services.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.