Optimizing Full GC Frequency in a Java Game Service Using MAT and GC Logs
By analyzing heap dumps with MAT and scrutinizing GC logs, the Vivo Internet Server Team identified thread‑local FutureAdapter and Jackson BufferRecycler objects and tuned promotion thresholds, cutting the Java game service’s Full GC occurrences from roughly 120 per day to about 30 and dramatically shortening pause times.
This article, authored by the Vivo Internet Server Team, describes the process of reducing the frequency of Full GC events in a game‑related Java service. The service, running on JDK 1.8 with CMS as the garbage collector, originally experienced up to 120 Full GC occurrences per day, with a Full GC every 7 minutes during peak traffic.
Background : The high Full GC rate caused noticeable latency spikes. The initial JVM startup parameters were:
-Xms4608M -Xmx4608M -Xmn2048M -XX:MetaspaceSize=320M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=92 -XX:+UseCMSInitiatingOccupancyOnly
Tools Used : The optimization relied on two main sources of information – the Eclipse Memory Analyzer Tool (MAT) and GC logs. MAT provides dominator trees, histograms, and OQL queries to pinpoint memory‑heavy objects. GC logs record the type, duration, and impact of each collection.
MAT Features :
Dominator Tree – shows object domination relationships and retained heap size.
Histogram – lists objects with shallow and retained heap sizes.
OQL – SQL‑like queries for filtering objects. Example queries used:
// String fuzzy match SELECT * FROM char[] b WHERE toString(b) LIKE ".*traceId.*" // Find objects with address > 0x700000000 SELECT * FROM java.lang.Object t WHERE toHex(t.@objectAddress) >= "0x700000000" // Find objects of length 73 with retained size > 1000 B SELECT * FROM java.lang.Object[] a WHERE a.@length=73 AND a.@retainedHeapSize>1000 // Find char[] of length 65536 with inbound references SELECT * FROM char[] a WHERE a.@length=65536 AND (inbounds(a).size()>0)
Case Studies :
Dubbo FutureAdapter objects : Over 550 unreachable FutureAdapter instances (≈200 MB) were retained in the old generation because they remained in ThreadLocal for several young‑GC cycles before promotion. The root cause was a thread pool that executed Dubbo calls infrequently, causing the objects to stay alive too long. Solutions included using Dubbo’s asynchronous API directly, resizing the thread pool, or adding a filter to clear FutureContext after synchronous calls.
Jackson BufferRecycler char[65536] arrays : Large numbers of 64 KB char arrays were kept in ThreadLocal BufferRecycler, leading to old‑generation retention when the same array was not replaced during subsequent deserializations. Remedies were to disable the ThreadLocal buffer recycling flag or upgrade Jackson (≥ 2.16) to use the new RecyclerPool implementation.
Object promotion age threshold : The default -XX:MaxTenuringThreshold=6 caused many objects to promote to the old generation after six young‑GC cycles. By increasing the threshold to 15 and raising -XX:TargetSurvivorRatio to 75 %, the promotion volume dropped dramatically, reducing Full GC frequency.
Results : After applying the above optimizations and JVM tuning, Full GC frequency fell from ~120 times/day to ~30 times/day, and total GC pause time decreased from 1–1.5 minutes per Full GC to 15–25 seconds per day.
Recommended Workflow :
Generate a heap dump (e.g., with jmap) and analyze it with MAT.
Use flame‑graphs to locate hot allocation paths.
Examine GC logs to understand collection patterns and adjust JVM flags accordingly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.