Backend Development 16 min read

JVM Garbage Collection Tuning for a Video Service to Reduce P99 Latency

By replacing the default Parallel GC with a ParNew‑CMS collector, enlarging the Young generation, fixing Metaspace settings, and tuning CMS occupancy thresholds, the video service cut Young and Full GC pauses dramatically, lowered Full GC count by over 80%, and achieved more than 30% P99 latency reduction, with some APIs improving up to 80%.

vivo Internet Technology

Oct 27, 2021

JVM Garbage Collection Tuning for a Video Service to Reduce P99 Latency

Background : In February 2021 the video app’s core API experienced high response latency during peak traffic. Monitoring showed that the P99 latency was dominated by long GC pauses, especially frequent Full GCs and Young GCs.

Observed GC behavior : Over a monitoring window the service performed on average 66 Young GCs per 10 minutes (peak 470) and 0.25 Full GCs per 10 minutes (peak 5). The default Parallel GC (Parallel Scavenge + Parallel Old) was unsuitable for a latency‑sensitive service.

Optimization goals :

Reduce API P99 latency by at least 30%.

Decrease the number and pause time of Young and Full GCs.

Three load‑based targets were defined (high, medium, low) with specific GC reduction percentages.

Current JVM configuration :

-Xms4096M -Xmx4096M -Xmn1024M</code><code>-XX:PermSize=512M</code><code>-XX:MaxPermSize=512M

Issues identified:

No explicit collector specified (default Parallel GC is throughput‑oriented).

Proposed collector : Switch to ParNew + CMS, which is better for low‑latency services.

Parameter selection principles :

Specify MetaSpace size consistently (e.g., 256 M for both -XX:MetaspaceSize and -XX:MaxMetaspaceSize).

Adjust Young generation size according to load; larger Young reduces GC frequency but may increase single‑GC pause.

Four candidate configurations :

-Xms4096M -Xmx4096M -Xmn2048M</code><code>-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M</code><code>-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark

-Xms4096M -Xmx4096M -Xmn2048M</code><code>-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M</code><code>-XX:+UseParNewGC -XX:+UseConcMarkSweepGC

-Xms4096M -Xmx4096M -Xmn1536M</code><code>-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M</code><code>-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark

-Xms4096M -Xmx4096M -Xmn1024M</code><code>-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M</code><code>-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark

Load‑test results :

High load (≈1100 QPS): The configuration with Young size increased by 0.5× (方案 4) gave the best P95/P99 reduction (≈50% lower) and cut Full GC cumulative pause by 88%.

Medium load (≈600 QPS): The configuration with Young size doubled (方案 2/3) performed best, reducing P95/P99 by ~32% and Full GC pause by 93%.

Based on overall performance, the “Young 0.5×” configuration was preferred for its superior high‑load behavior.

Gray‑release plan : Deploy the chosen configuration on a few machines for a few days, monitor GC metrics, then roll out globally.

Further analysis : Observed occasional long pauses (2‑3 s) caused by CMS Foreground GC triggered by promotion failures due to old‑generation fragmentation. Mitigation: set -XX:CMSInitiatingOccupancyFraction=75 and -XX:+UseCMSInitiatingOccupancyOnly to trigger CMS Background GC earlier.

Final tuned configuration :

-Xms4096M -Xmx4096M -Xmn1536M</code><code>-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M</code><code>-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark</code><code>-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly

Result verification : After a 7‑day gray period, Young GC count dropped ~30%, cumulative Young GC time ↓ 17%, while single Young GC pause grew ~7 ms. Full GC frequency and pause time fell dramatically (Full GC count ↓ 82%, average pause ↓ 85%). API P99 latency improvements: Interface A ↓ 19%, Interface B ↓ 41%, Interface C ↓ 80%.

Conclusion : Proper JVM GC tuning—choosing ParNew + CMS, sizing Young generation, and controlling Metaspace and CMS thresholds—significantly reduced latency and GC pauses for the video service, providing a repeatable tuning workflow for similar backend services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java JVM Garbage Collection Performance Tuning latency optimization CMS ParNew

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.