Operations 28 min read

Understanding and Optimizing ZGC (Z Garbage Collector) for Low‑Latency Java Services

This article examines the Z Garbage Collector (ZGC) introduced in JDK 11, detailing its low‑pause design goals, underlying concurrent marking‑copy algorithm, colored pointer and read‑barrier techniques, practical tuning parameters, real‑world case studies, and the performance impact of upgrading from CMS/G1 to ZGC in high‑throughput, low‑latency services.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Understanding and Optimizing ZGC (Z Garbage Collector) for Low‑Latency Java Services

ZGC (The Z Garbage Collector) was introduced in JDK 11 as a low‑latency collector whose design goals are to keep pause times under 10 ms, make pause time independent of heap size or live data size, and support heap sizes from 8 MB up to several terabytes.

GC Pain : In many high‑availability Java services, stop‑the‑world (STW) pauses caused by CMS or G1 significantly affect response time. An example from Meituan’s risk‑control service shows that a 40 ms Young GC occurring ten times per minute can increase the latency of over 1 % of requests, violating a 65 ms SLA.

ZGC Principles : Like CMS and G1, ZGC uses a mark‑copy algorithm but makes the marking, relocation, and re‑location phases almost fully concurrent. Only three STW phases remain—initial mark, final mark, and initial relocation—each proportional to the number of GC roots, not to heap size.

Key Technologies :

Colored pointers store object liveness bits in the high bits of a 64‑bit address, avoiding extra header writes.

Read barriers intercept every heap read; if the object has moved, the barrier updates the reference to the new address.

Example of a read‑barrier in Java:

Object o = obj.FieldA   // read from heap, needs barrier
<Load barrier>
Object p = o            // no barrier needed
o.doSomething()          // no barrier needed
int i = obj.FieldB      // primitive, no barrier

Tuning Practice : ZGC requires careful parameter configuration. A typical production configuration looks like:

-Xms10G -Xmx10G
-XX:ReservedCodeCacheSize=256m -XX:InitialCodeCacheSize=256m
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
-XX:ConcGCThreads=2 -XX:ParallelGCThreads=6
-XX:ZCollectionInterval=120 -XX:ZAllocationSpikeTolerance=5
-XX:+UnlockDiagnosticVMOptions -XX:-ZProactive
-Xlog:safepoint,classhisto*=trace,age*,gc*=info:file=/opt/logs/gc-%t.log:time,tid,tags:filecount=5,filesize=50m

Key tuning points include heap size, number of concurrent GC threads, collection interval, and allocation‑spike tolerance. Adjusting these parameters can mitigate allocation stalls, improve GC latency, and avoid long pauses caused by large GC‑root sets.

Real‑World Cases :

During a flash‑sale traffic spike, allocation stalls caused second‑level pauses; fixing it required enabling a fixed‑interval trigger (‑XX:ZCollectionInterval) and increasing the spike tolerance.

In a load‑test scenario, frequent GC pauses were reduced by increasing ‑XX:ConcGCThreads.

Excessive ClassLoader instances generated by the Aviator expression engine inflated GC‑root size; upgrading Aviator eliminated the problem.

Growing CodeCache size from JIT‑compiled Aviator expressions increased pause time; the issue was solved by pruning unused expressions.

Upgrade Effects : After migrating the Zeus rule platform to ZGC, latency (TP999) improved by 12‑142 ms (18‑74 % reduction) in sub‑200 ms latency scenarios, while throughput slightly decreased in some batch‑oriented clusters due to ZGC being a single‑generation collector.

Conclusion : ZGC achieves sub‑10 ms pauses by making most GC work concurrent and by using colored pointers and read barriers. Proper tuning and addressing application‑specific root‑set issues enable ZGC to dramatically improve availability for low‑latency services.

References :

ZGC official documentation

Peng Cheng‑Han, "The Design and Implementation of the Next‑Generation Garbage Collector ZGC", Mechanical Industry Press, 2019

Various articles on Java GC optimization

Java HotSpot G1 GC key technologies

Appendix :

Guidelines for evaluating benefits, costs, and risks when upgrading to JDK 11 with ZGC, including compatibility checks, build‑packaging steps, deployment methods (new VMs, scripts, containers), monitoring (GC time/frequency via CAT), and performance analysis tools (Scalpel, JProfiler).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJVMGarbage CollectionzgcLow latency
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.