Backend Development 20 min read

How to Eliminate GC Pauses in High‑QPS Java Services: A Step‑by‑Step JVM Tuning Guide

This article investigates a high‑concurrency Java service that suffers from long GC pauses during large index swaps, identifies YGC Object Copy as the root cause, and presents a series of JVM tuning techniques—including MaxTenuringThreshold, InitialTenuringThreshold, AlwaysTenure, G1HeapRegionSize, ZGC, and an Eden‑preheat strategy—to achieve near‑zero service disruption and 99.995% success rate.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
How to Eliminate GC Pauses in High‑QPS Java Services: A Step‑by‑Step JVM Tuning Guide

Problem Background

A high‑traffic (up to 100k QPS) low‑latency Java service experiences instability when swapping a ~0.5 GB in‑memory index; upstream calls time out, and success rate drops from 95% to 99.5%.

Root Cause Analysis

GC logs show that during index swaps the Young Generation GC (YGC) spends excessive time in the Object Copy phase, copying the large index multiple times and causing long STW pauses that block all request threads.

Optimization Attempts

Let index promote early to Old Generation – Adjust

MaxTenuringThreshold

(set to 1) and

InitialTenuringThreshold

(set to 1). G1GC already performs direct tenuring, so this had limited effect.

Force always‑tenure – Enable

AlwaysTenure

to skip Survivor space; results similar to the above.

Direct allocation to Old – Tried

PretenureSizeThreshold

and

G1HeapRegionSize

, but G1GC ignores them for the index’s many small objects.

Accelerate copy – Tweaked

MaxGCPauseMillis

,

ParallelGCThreads

,

ConcGCThreads

; no noticeable gain.

Upgrade to ZGC (JDK 11) – Reduced pause times dramatically, raising success rate to 99.5%, but occasional Allocation Stalls remained.

Final Solution: Eden‑Preheat + Batch Gray Release

During a controlled “gray‑release” (service is temporarily stopped), allocate temporary objects to exhaust the Eden space, forcing a YGC that moves the new index to Old before traffic resumes. This guarantees that subsequent YGCs are fast (milliseconds) and eliminates timeout spikes.

<code>public boolean switchIndex(String indexPath) {<br/>    try {<br/>        // 1. Load new index (service is offline)<br/>        MyIndex newIndex = loadIndex(indexPath);<br/>        // 2. Switch index<br/>        this.index = newIndex;<br/>        // 3. Eden pre‑heat – force a GC<br/>        for (int i = 0; i < 10000; i++) {<br/>            char[] tempArr = new char[524288];<br/>        }<br/>        // 4. Notify upstream that switch is complete<br/>        return true;<br/>    } catch (Exception e) {<br/>        return false;<br/>    }<br/>}</code>

Result

After applying the pre‑heat strategy together with the batch gray release, the service achieves a stable success rate of 99.995% with virtually no observable impact during index swaps.

Key Takeaways

Identify GC phases that cause long pauses (Object Copy).

Adjust tenuring thresholds, but recognize G1GC’s built‑in optimizations.

Consider newer collectors (ZGC) for low‑pause requirements.

When GC pauses cannot be eliminated, use operational tricks (pre‑heat, gray release) to move heavy objects to Old before traffic resumes.

GC pause analysis
GC pause analysis
JavaJVMPerformanceGray ReleaseZGChigh concurrencyGC tuning
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.