How to Eliminate GC Pauses in High‑QPS Java Services: A Step‑by‑Step JVM Tuning Guide

This article investigates a high‑concurrency Java service that suffers from long GC pauses during large index swaps, identifies YGC Object Copy as the root cause, and presents a series of JVM tuning techniques—including MaxTenuringThreshold, InitialTenuringThreshold, AlwaysTenure, G1HeapRegionSize, ZGC, and an Eden‑preheat strategy—to achieve near‑zero service disruption and 99.995% success rate.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
How to Eliminate GC Pauses in High‑QPS Java Services: A Step‑by‑Step JVM Tuning Guide

Problem Background

A high‑traffic (up to 100k QPS) low‑latency Java service experiences instability when swapping a ~0.5 GB in‑memory index; upstream calls time out, and success rate drops from 95% to 99.5%.

Root Cause Analysis

GC logs show that during index swaps the Young Generation GC (YGC) spends excessive time in the Object Copy phase, copying the large index multiple times and causing long STW pauses that block all request threads.

Optimization Attempts

Let index promote early to Old Generation – Adjust MaxTenuringThreshold (set to 1) and InitialTenuringThreshold (set to 1). G1GC already performs direct tenuring, so this had limited effect.

Force always‑tenure – Enable AlwaysTenure to skip Survivor space; results similar to the above.

Direct allocation to Old – Tried PretenureSizeThreshold and G1HeapRegionSize, but G1GC ignores them for the index’s many small objects.

Accelerate copy – Tweaked MaxGCPauseMillis, ParallelGCThreads, ConcGCThreads; no noticeable gain.

Upgrade to ZGC (JDK 11) – Reduced pause times dramatically, raising success rate to 99.5%, but occasional Allocation Stalls remained.

Final Solution: Eden‑Preheat + Batch Gray Release

During a controlled “gray‑release” (service is temporarily stopped), allocate temporary objects to exhaust the Eden space, forcing a YGC that moves the new index to Old before traffic resumes. This guarantees that subsequent YGCs are fast (milliseconds) and eliminates timeout spikes.

public boolean switchIndex(String indexPath) {<br/>    try {<br/>        // 1. Load new index (service is offline)<br/>        MyIndex newIndex = loadIndex(indexPath);<br/>        // 2. Switch index<br/>        this.index = newIndex;<br/>        // 3. Eden pre‑heat – force a GC<br/>        for (int i = 0; i < 10000; i++) {<br/>            char[] tempArr = new char[524288];<br/>        }<br/>        // 4. Notify upstream that switch is complete<br/>        return true;<br/>    } catch (Exception e) {<br/>        return false;<br/>    }<br/>}

Result

After applying the pre‑heat strategy together with the batch gray release, the service achieves a stable success rate of 99.995% with virtually no observable impact during index swaps.

Key Takeaways

Identify GC phases that cause long pauses (Object Copy).

Adjust tenuring thresholds, but recognize G1GC’s built‑in optimizations.

Consider newer collectors (ZGC) for low‑pause requirements.

When GC pauses cannot be eliminated, use operational tricks (pre‑heat, gray release) to move heavy objects to Old before traffic resumes.

GC pause analysis
GC pause analysis
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaJVMgray releasezgchigh concurrencyGC tuning
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.