ZGC: Principles, Tuning Practices, and Production Upgrade Experience

The article explains how Meituan’s risk‑control platform eliminated frequent 40 ms CMS pauses by adopting JDK 11’s ZGC—detailing its concurrent mark‑copy design, practical tuning parameters, real‑world case fixes, and measured latency reductions of up to 74 % while noting trade‑offs.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
ZGC: Principles, Tuning Practices, and Production Upgrade Experience

Many low‑latency, high‑availability Java services suffer from GC pauses, which affect system availability. ZGC, introduced in JDK 11, is a next‑generation low‑pause garbage collector designed for large‑heap, low‑latency scenarios.

The article discusses the pain points of GC, the principles of ZGC, practical tuning, and the results of upgrading to ZGC in Meituan’s risk‑control platform.

GC Pain

GC pause (Stop‑The‑World) stops all application threads. In Meituan’s risk‑control service, CMS caused Young GC pauses of ~40 ms, occurring 10 times per minute, increasing response latency and reducing availability.

ZGC Principles

ZGC uses a mostly concurrent mark‑copy algorithm. It reduces pause time to <10 ms regardless of heap size by making the initial mark, final mark, and initial relocate phases the only STW phases, whose duration depends only on the number of GC roots.

Key techniques:

Colored pointers store object liveness in high bits of the pointer.

Load barriers update references on the fly during concurrent relocation.

Address space layout: 0‑4 TB for Java heap, 4‑8 TB (M0), 8‑12 TB (M1), 16‑20 TB (Remapped). Objects have virtual addresses in all three spaces; only one is active at a time.

Tuning Practices

Typical ZGC tuning parameters (example):

-Xms10G -Xmx10G
-XX:ReservedCodeCacheSize=256m -XX:InitialCodeCacheSize=256m
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
-XX:ConcGCThreads=2 -XX:ParallelGCThreads=6
-XX:ZCollectionInterval=120 -XX:ZAllocationSpikeTolerance=5
-XX:+UnlockDiagnosticVMOptions -XX:-ZProactive
-Xlog:safepoint,classhisto*=trace,age*,gc*=info:file=/opt/logs/gc-%t.log:time,tid,tags:filecount=5,filesize=50m

Key tuning points:

Enable fixed‑interval GC (‑XX:ZCollectionInterval) for traffic spikes.

Increase allocation‑spike tolerance (‑XX:ZAllocationSpikeTolerance) to trigger GC earlier.

Adjust concurrent GC threads (‑XX:ConcGCThreads) to speed up marking.

Case Studies

Four typical issues and solutions:

Memory‑allocation stalls during flash‑sale traffic – use fixed‑interval GC and larger tolerance.

Frequent GC with long pauses – increase concurrent GC threads.

Large number of ClassLoader roots causing 30 ms pauses – upgrade Aviator component to reduce ClassLoader creation.

Growing CodeCache causing pauses – reduce unnecessary JIT compilation by removing unused expressions.

Upgrade Effects

Latency improvements: TP999 reduced by 12‑142 ms (18‑74 %); TP99 reduced by 5‑28 ms (10‑47 %). Throughput may decline for CPU‑bound workloads because ZGC is a single‑generation collector and incurs load‑barrier overhead.

Evaluation

Assess benefit, cost, and risk before upgrading JDK 11 with ZGC. Benefits include lower pause latency; costs involve compatibility work and configuration changes; risks are mitigated by thorough testing.

Conclusion

ZGC provides sub‑10 ms pauses even for multi‑terabyte heaps, making it suitable for low‑latency services. Meituan’s experience shows that with proper tuning, ZGC can significantly improve service availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaGarbage Collectionzgcperformance tuningJDK11Low latency
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.