Upgrading to JDK 21 and Adopting Generational ZGC: Motivation, Design, Implementation, Monitoring, and Performance Evaluation
This article explains why the backend services were upgraded from JDK 8 to JDK 21, introduces the generational ZGC garbage collector, details its architecture, tuning parameters, integration steps, monitoring setup, and presents performance test results that demonstrate reduced allocation stalls, lower latency, higher throughput, and near‑zero GC pauses.
The existing backend services of the ZhaiZhai platform ran on JDK 8 (1.8.0_191). New JDK releases bring language features, security improvements, and lower overhead, but the service faced high allocation‑stall rates during traffic spikes (e.g., 618, Double‑11) because the old GC could not keep up with memory allocation.
JDK 21 was chosen because it is a long‑term support release that includes the Z Garbage Collector (ZGC) and, starting with JEP 439, a generational version of ZGC. Compared with JDK 17, JDK 21’s generational ZGC can achieve the same throughput with only 70 % of the heap memory and pause times under 1 ms, dramatically reducing allocation stalls.
Generational ZGC Overview
Generational ZGC is a low‑latency, scalable GC that supports terabyte‑scale heaps. It divides the heap into a young generation and an old generation, each collected independently. Objects are allocated in the young generation; after surviving several young‑generation collections they are promoted to the old generation.
The heap layout is illustrated by diagrams showing separate memory regions for young and old generations, which may be non‑contiguous in physical memory.
GC proceeds in phases: a brief synchronous pause to start marking, followed by concurrent marking, preparation, another pause, and finally concurrent object relocation. Minor Collections target only the young generation, while Major Collections reclaim the entire heap.
Key tuning parameter: the maximum heap size (-Xmx). Most other ZGC flags are unnecessary for generational ZGC; only -XX:+UseZGC and -XX:+ZGenerational are required when running on JDK 21.
Design Highlights
Generational ZGC uses colored pointers, load barriers, and store barriers to maintain a consistent object graph without multi‑mapped memory. Colored pointers embed metadata (e.g., liveness) directly in the pointer value.
Optimizations include fast/slow paths for barriers, minimized load‑barrier work, remembered‑set barriers (SATB), fused store‑barrier checks, double‑buffered remembered sets, relocation without extra heap memory, dense heap regions, and support for large objects in the young generation.
Integration and Monitoring
To adopt generational ZGC, the team upgraded to JDK 21 and added JVM options:
-XX:MetaspaceSize=640m -XX:MaxMetaspaceSize=640m -Xms12g -Xmx12g -XX:+UseZGC -XX:+ZGenerational -Xlog:safepoint,classhisto*=trace,age*,gc*=info:file=gc-%t.log:time,tid,tags:filecount=5,filesize=50mThey also addressed compatibility issues such as deprecated APIs, Spring Boot version requirements (≥ 2.7.17), IDEA version (≥ 2023.3.2), and Lombok updates.
Monitoring was built by implementing a NotificationListener that registers with the JVM’s GC MXBeans, filters GC notifications, and extracts metrics such as pause time, pause count, GC cause, heap usage before/after GC, and per‑generation memory usage. These metrics are exported to Prometheus using counters and gauges (e.g., ZGC_GC_CAUSE, ZGC_HEAP_USED, ZGC_GC_PAUSE_TIME).
Performance Testing
The team conducted load tests comparing JDK 21 ZGC, JDK 21 generational ZGC, and JDK 17 ZGC under 2×, 4×, and 8× peak traffic. Each test ran three 10‑minute rounds with three instances per configuration, targeting core product‑list APIs.
Results for the 8× peak scenario showed:
CPU usage increased by ~20 %.
Maximum memory usage remained ~98 %.
GC pause time was virtually zero (≤ 1 ms per pause, only 2–3 QPS affected).
Allocation‑stall occurrences dropped by 85 % (638 → 94).
Overall QPS improved by 15 % (737 → 842).
Average latency (TPAvg) fell from 1300 ms to 788 ms.
90th‑percentile latency (TP90) fell from 1963 ms to 1660 ms.
99th‑percentile latency (TP99) fell from 4473 ms to 1967 ms.
Error rate decreased from 40.88 % to 12.91 % (a 28‑point drop).
These findings confirm that generational ZGC improves resource utilization, reduces allocation stalls, increases throughput, and virtually eliminates GC‑induced pauses.
Future Work
All core services of the platform have been migrated to JDK 21, and the team plans to leverage additional JDK 21 features such as virtual threads and structured concurrency for further performance gains.
Acknowledgments
The authors thank the architecture, engineering efficiency, and operations teams for their support during the JDK 21 migration.
References
[1] Oracle Java SE Support Roadmap (2024). [2] Iris Clark & Stefan Karlsson, “The Z Garbage Collector (ZGC)” (2023). [3] Oracle JDK 21 ZGC documentation. [4] Erik Österlund, “Generational ZGC and Beyond” (2023). [5] Oracle GC implementation guide. [6] JEP 439: Generational ZGC (2023). [7] JDK 21 deprecated API list. [8] Yuan Chong, “JDK 21 Research and Pitfalls” (2024). [9] Andy Wilkinson, “Spring Boot 2.0 Migration Guide” (2021).
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.