Investigation and Optimization of Long GC Pauses Caused by Excessive FinalReference in Spring Cloud Gateway
The team discovered that frequent Logback file rotations created thousands of FileOutputStream objects whose finalize‑based FinalReference instances flooded the G1GC reference‑processing phase, causing 13‑second pauses, and resolved the issue by enabling parallel reference processing, enlarging log rotation size, and adjusting GC initiation thresholds.
Background: In a micro‑service architecture, the team uses Spring Cloud Gateway as the business gateway. After a period of stable operation, intermittent 5xx alerts appeared, lasting from several seconds up to half a minute. Investigation revealed that Logback generated a large number of FinalReference objects, causing long GC pauses during the G1GC remark phase.
Problem Diagnosis: Initial CPU and memory checks showed stable CPU usage (~20%) but occasional memory spikes in a container instance, suggesting a possible JVM‑level issue. The team ruled out memory leaks in Netty ByteBuf and focused on GC logs. No full GC was observed, but the remark phase took up to 13 seconds. Detailed GC logs (enabled with -XX:+PrintReferenceGC -XX:+PrintGCDetail) showed that the ref‑proc sub‑phase spent over 13 seconds processing more than 23 000 FinalReference objects.
Reference Processing Details: The remark phase includes a reference processing step where active references are discovered, moved to a pending queue, and eventually enqueued for application‑level handling. The ref‑proc work occurs between the discover_reference and process_discovered_references steps. The large number of active FinalReference objects caused the prolonged pause.
Root Cause Discovery: Heap dumps were taken with jmap -dump:live,format=b,file=heapdump.phrof pid. Analysis showed that many FinalReference objects originated from java.io.FileOutputStream, which implements finalize(). Further inspection revealed that Logback’s ResilientFileOutputStream creates a new FileOutputStream for each log file rotation. Because the logging configuration rotated files every 100 MB, thousands of log files (and thus FileOutputStream instances) were created, leading to the massive FinalReference count.
Optimization Measures: 1) Enable parallel reference processing with -XX:+ParallelRefProcEnabled (and optionally tune -XX:ParallelGCThreads=n) to speed up ref‑proc. 2) Increase the log file size limit to reduce the number of rotated files, thereby decreasing the number of FileOutputStream instances. 3) Lower -XX:InitiatingHeapOccupancyPercent to trigger old‑generation mixed GCs more frequently, preventing long‑term accumulation of FinalReference objects.
Conclusion: The issue was not a code bug but a combination of excessive log file rotation and infrequent old‑generation GC, which together produced a large backlog of FinalReference objects and caused long GC pauses. Applying the above tuning steps eliminated the pauses and improved overall service stability.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
