Why Upgrade to JDK21? A Deep Dive into Generational ZGC and Its Performance Impact

This article explains the motivations for upgrading to JDK21, introduces Generational ZGC, details integration steps and monitoring setup, presents performance test results comparing ZGC and Generational ZGC, and outlines future plans for Java services at Zhaozhuan.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Why Upgrade to JDK21? A Deep Dive into Generational ZGC and Its Performance Impact

1. Motivation for JDK Upgrade

The backend services of the platform were running on JDK 1.8 (1.8.0_191). Under peak traffic (e.g., 618, Double‑11) the product‑list service creates massive short‑lived objects, causing frequent young‑generation (YGC) and full‑generation (FGC) pauses, long GC times, and reduced availability. Vertical scaling cannot handle the large heap, and horizontal scaling raises cost and connection counts. The low‑latency, concurrent Z Garbage Collector (ZGC) with millisecond‑level pauses matches these requirements, prompting an upgrade.

2. Why JDK 21

Oracle’s long‑term support roadmap includes JDK 11, 17, and 21. ZGC is supported from JDK 11 onward, but JDK 17 exhibited Allocation Stall events when load reached four times the normal peak, leading to thread pauses and service degradation. JDK 21 introduces Generational ZGC (JEP 439), which can achieve the same throughput with only ~70 % of the heap memory and pause times under 1 ms, eliminating Allocation Stalls. Therefore JDK 21 was chosen for migration.

3. Generational ZGC Overview

3.1 What Is Generational ZGC?

Generational ZGC is a variant of ZGC that assumes most objects are short‑lived. It splits the heap into a Young Generation and an Old Generation, improving throughput and reducing Allocation Stall frequency while preserving sub‑millisecond pause times.

3.2 Collection Process

The heap is logically divided into two regions. New objects are allocated in the Young Generation; objects that survive multiple young‑generation collections are promoted to the Old Generation. The collection of a generation consists of alternating pause points and concurrent phases:

Pause 1 – start of marking (synchronous).

Concurrent 1 – concurrent marking and object remapping via load barriers.

Pause 2 – end of marking.

Concurrent 2 – preparation for region evacuation, reference processing, class unloading.

Pause 3 – before object movement.

Concurrent 3 – object relocation to create contiguous memory.

Minor Collections handle only the Young Generation; Major Collections handle the entire heap. Both run concurrently with the application.

3.3 Tuning

Generational ZGC is largely self‑tuning. The only user‑adjustable parameter of significance is the maximum heap size ( -Xmx). Larger heaps generally improve performance. Most traditional GC flags (e.g., -Xmn, -XX:TenuringThreshold, -XX:InitiatingHeapOccupancyPercent, -XX:ConcGCThreads) are ignored.

3.4 Design Highlights

Colored pointers : Pointers embed liveness metadata, eliminating separate mark tables.

Load barrier : Injected by the JIT to strip metadata and update relocated references.

Store barrier : Injected by the JIT to add metadata, maintain the remembered set, and mark objects as live.

Optimized barriers, double‑buffered remembered sets, and dense heap regions further reduce overhead.

4. Integration and Monitoring

4.1 Enabling Generational ZGC

After upgrading to JDK 21, enable Generational ZGC with the JVM options:

-XX:MetaspaceSize=640m -XX:MaxMetaspaceSize=640m 
-Xms12g -Xmx12g 
-XX:+UseZGC -XX:+ZGenerational 
-Xlog:safepoint,classhisto*=trace,age*,gc*=info:file=gc-%t.log:time,tid,tags:filecount=5,filesize=50m

Note that Spring Boot 2.7.17+ (or 2.7.18 for better compatibility) and IDEA 2023.3.2+ are required for JDK 21.

4.2 Monitoring Generational ZGC

Key GC metrics include pause time, pause frequency, and the cause of GC (especially Allocation Stall). A NotificationListener can be registered to the GarbageCollectorMXBean to capture these events. Example listener registration:

public class InfoShowGCNotificationFilter implements NotificationFilter {
    @Override
    public boolean isNotificationEnabled(Notification notification) {
        return GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION.equals(notification.getType());
    }
}

public class InfoShowGCNotificationListener implements NotificationListener {
    @Override
    public void handleNotification(Notification notification, Object handback) {
        GarbageCollectionNotificationInfo info = GarbageCollectionNotificationInfo.from((CompositeData) notification.getUserData());
        GcInfo gcInfo = info.getGcInfo();
        if ("end of GC pause".equals(info.getGcAction())) {
            // record pause time, e.g., to Prometheus
        }
        // additional processing for cause, memory usage, etc.
    }
}

// Registration (simplified)
for (GarbageCollectorMXBean gcBean : ManagementFactory.getGarbageCollectorMXBeans()) {
    NotificationEmitter emitter = (NotificationEmitter) gcBean;
    emitter.addNotificationListener(new InfoShowGCNotificationListener(), new InfoShowGCNotificationFilter(), gcBean);
}

Prometheus collectors can be defined to expose metrics such as ZGC_GC_PAUSE_TIME, ZGC_GC_CAUSE, ZGC_HEAP_USED, etc.

5. Performance Evaluation

5.1 Test Environment

JDK 21 version 21.0.2_13 was used. Each test group consisted of three service instances. The tested endpoints were core product‑list APIs (App home page, main search, C2C recommendation). Three load levels were applied: 2×, 4×, and 8× the normal peak QPS, each for 10 minutes.

5.2 Results

Across all load levels, Generational ZGC consistently reduced Allocation Stall occurrences and GC pause times. Under the 8× peak load, key observations were:

CPU average usage increased by ~20 %.

Maximum heap usage remained ~98 %.

GC pause time per event stayed below 1 ms; pause‑induced QPS loss was negligible (2–3 QPS).

Allocation Stall count dropped from 638 to 94 (≈85 % reduction).

Overall QPS improved from 737 to 842 (≈15 % increase).

Average response time (TPAvg) decreased from 1300 ms to 788 ms.

90th‑percentile latency (TP90) fell from 1963 ms to 1660 ms.

99th‑percentile latency (TP99) fell from 4473 ms to 1967 ms.

Error rate dropped from 40.88 % to 12.91 % (≈28 percentage‑point reduction).

5.3 Conclusion

Generational ZGC delivers sub‑millisecond pauses, dramatically fewer Allocation Stalls, higher throughput, and lower latency under heavy load, with only modest CPU overhead. It is therefore suitable for production deployment of the platform’s services.

6. Summary

Upgrading to JDK 21 and enabling Generational ZGC provides a scalable, low‑latency garbage collection solution that improves resource utilization and service stability for high‑traffic Java back‑ends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavamonitoringPerformance TestingGarbage CollectionJDK21Generational ZGC
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.