Fundamentals 28 min read

How Tencent Kona JDK 11 ZGC Delivers Millisecond‑Level GC Pauses for Real‑Time Services

Tencent's Kona JDK 11 introduces a production‑ready ZGC implementation that reduces Java garbage‑collection stop‑the‑world pauses to under 10 ms, enabling ultra‑low‑latency online services across massive heaps while maintaining acceptable throughput, and the article details its design, tuning, and real‑world deployments.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
How Tencent Kona JDK 11 ZGC Delivers Millisecond‑Level GC Pauses for Real‑Time Services

Background

Java’s ecosystem has grown for over two decades, covering everything from embedded devices to large data centers. While many workloads care about overall throughput, latency‑sensitive applications—such as UI rendering at 60 Hz, real‑time bidding in advertising, and high‑frequency trading—require strict limits on garbage‑collection (GC) pauses, often under 15 ms.

Traditional GC algorithms like CMS and G1 exhibit pause times that grow with heap size, reaching minutes for full GC on multi‑hundred‑gigabyte heaps, making them unsuitable for high‑SLA services.

Why a New GC Was Needed

With server memory capacities expanding to tens or hundreds of gigabytes (even terabytes), the stop‑the‑world (STW) pauses of classic tracing GC become a major bottleneck. Applications demanding 99.99 % of requests to finish within 100 ms cannot tolerate the pause durations of CMS or G1.

In response, JDK 11 introduced ZGC (Z Scalable Low‑Latency GC), a concurrent, low‑pause collector designed to keep STW pauses below 10 ms regardless of heap size.

Tencent Kona JDK 11 and ZGC

Tencent’s Big Data JVM team built Tencent Kona JDK 11, a downstream of OpenJDK 11, and enhanced ZGC to production‑ready quality. Since its GA release on 30 April 2021, the ZGC implementation has been open‑sourced and deployed in many internal Tencent services, achieving 2–3 orders of magnitude latency improvements.

ZGC Design Goals

Maintain total GC pause time under 10 ms.

Limit throughput loss compared to G1 to no more than 15 %.

Support very large heaps (8 MiB ~ 16 TiB) without pause time growing with heap size.

These goals are achieved by moving most GC work out of STW into concurrent phases and by redesigning data structures such as GC roots, runtime metadata, and object relocation.

ZGC Algorithm Implementation

ZGC follows a Mark‑Compact model but splits work into six phases, only three of which involve STW:

Pause Mark Start – lightweight global initialization.

Concurrent Mark & Remap – concurrent scanning of GC roots and updating object references.

Pause Mark End – synchronize and finish concurrent marking.

Concurrent Prepare – handle weak references and select regions for compaction.

Pause Relocate Start – global sync before object movement.

Concurrent Relocate – move objects while Java threads run, protected by a read barrier.

Key innovations include:

Concurrent scanning of GC roots, requiring a redesign of the root data structures.

Runtime metadata handling for class, method, JIT code, and weak references.

Colored pointers (four high‑order bits) that encode object state (Mark0, Mark1, Remapped, Finalizable) and enable a lightweight read barrier.

Because only three short STW phases remain, ZGC can keep pause times in the millisecond range even on multi‑terabyte heaps.

Overhead of ZGC

While ZGC dramatically reduces pause time, it introduces several overheads on the mutator threads:

Read‑barrier cost on every object reference read.

Entry‑barrier overhead for JIT‑compiled methods.

Frame‑barrier overhead for concurrent stack scanning (StackWaterMark).

Additional lock structures in the runtime.

CPU contention between concurrent GC threads and application threads.

These overheads are generally modest; benchmark data shows a 5‑20 % throughput impact on large heaps and a 10 % slowdown on small heaps.

Usage and Tuning

Typical scenarios that benefit most from ZGC are:

Very large heaps (hundreds of GB) where Full GC would cause minute‑scale pauses.

High‑SLA services requiring sub‑100 ms tail latency regardless of heap size.

Enabling ZGC is simple: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC Advanced tuning parameters include:

Heap size (Xmx) – ensure the heap is large enough to avoid AllocationStall.

GC trigger thresholds (ZAllocationSpikeTolerance, ZCollectionInterval) – control when GC starts.

GC thread counts (ParallelGCThreads, ConcGCThreads) – adjust STW and concurrent thread parallelism.

System‑level adjustments may be required for very large heaps:

Increase /dev/shm size for the memory‑file used by ZGC (e.g., vi /etc/fstab then mount -o remount /dev/shm).

Raise the maximum number of memory mappings ( /proc/sys/vm/max_map_count) to accommodate three mappings per ZPage.

Production Experience at Tencent

ZGC has been deployed in several large‑scale Tencent services:

Hermes (real‑time analytics) : Switching from G1 to ZGC raised the 99 % SQL latency compliance from 98.1 % to 99.5 % and reduced GC‑induced latency to under 20 ms.

VPC configuration service : On a 512 GB machine, ZGC eliminated >10 s tail latencies seen with G1, increased storage capacity by 12.5 % and kept read/write latency under 50 ms.

WAF (Netty‑based HTTP firewall) : After moving from G1 to ZGC, the 99.99 % P9999 request latency stabilized below 80 ms, meeting strict SLA requirements.

Community Contributions

The team reported and fixed several ZGC‑related bugs in the OpenJDK community:

Integration with VectorAPI required adding a load barrier to generated code (merged into JDK 16).

Mark‑Stack overflow caused by excessive stack fragmentation and duplicate entries (fixed and back‑ported to JDK 17).

“Fake‑deadlock” during Concurrent Mark due to log‑file lock contention (fixed and merged into JDK 17).

Open‑Source Release

Tencent Kona JDK 8 and 11 are publicly available:

GitHub repositories:

https://github.com/Tencent/TencentKona-8

https://github.com/Tencent/TencentKona-11

These releases include the production‑ready ZGC implementation and can be used to evaluate low‑latency GC in any Java 11 environment.

Garbage CollectionzgcJava performancelow-latencyTencent Kona JDK
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.