NetEase Interview: From Default GC to ZGC—How I Got Slammed Three Times
A NetEase interview probes deep JVM garbage‑collection knowledge, exposing misconceptions about default GC, demanding precise G1/ZGC tuning, and challenging candidates with performance‑critical scenarios that reveal hidden latency and memory‑usage pitfalls.
A candidate faced a NetEase interview that started with a basic question about which objects are reclaimed by the JVM and which GC algorithms exist. The interviewer quickly corrected the candidate’s oversimplified answer, emphasizing that "unreachable" does not equal "unreferenced" and that only objects with a completely broken strong‑reference chain are truly collectible.
The interview then moved to compare Mark‑Sweep and Copying collectors, asking how to optimize a service that experiences three Full GCs per day. The candidate suggested switching to G1 with -XX:MaxGCPauseMillis=50, but the interviewer counter‑asked about G1’s remembered set memory usage and the worst‑case 200 ms pause, exposing a knowledge gap.
Next, the interviewer challenged the candidate on ZGC, asking whether its pauses truly vanish and how a 12 ns read‑barrier per access and a 40 ms TLB‑miss spike affect latency. The candidate failed to answer and the interview ended.
The article then provides a systematic, layered approach to JVM GC tuning:
Layer 1 – Static generational (Parallel/CMS): Simple configuration for monolithic back‑ends, but suffers from long STW pauses (Full GC up to 1.8 s) and high GC overhead (≥18.5%). Example flags:
-Xms2g -Xmx2g
-XX:+UseParallelGC
-XX:ParallelGCThreads=8
-XX:MaxMetaspaceSize=512m
-XX:+DisableExplicitGCLayer 2 – Adaptive generational (G1/ZGC): G1 acts like an adaptive traffic light, adjusting pause targets via -XX:MaxGCPauseMillis and -XX:InitiatingHeapOccupancyPercent. ZGC uses region‑based heap, incremental reclamation, and colored pointers to achieve sub‑10 ms pauses, but requires large pages to avoid TLB‑miss spikes. Example flags for G1:
-XX:+UseG1GC
-Xmx4g
-XX:MaxGCPauseMillis=50
-XX:G1HeapRegionSize=4M
-XX:InitiatingHeapOccupancyPercent=45
-XX:+UseStringDeduplicationExample flags for ZGC:
-XX:+UnlockExperimentalVMOptions
-XX:+UseZGC
-Xmx4g
-XX:+UseLargePages
-XX:ZUncommitDelay=300
-XX:ZCollectionInterval=5Layer 3 – Full‑stack autonomous (ZGC + eBPF + GraalVM): Combine ZGC’s low‑latency pauses with GraalVM JIT or Native Image for runtime speed, and use eBPF to trace ZGC relocation and TLB flush events for root‑cause analysis. Sample eBPF script monitors ZRelocate::relocate_object and tlb_flush to pinpoint latency spikes.
Performance data illustrate the impact of each layer: with static generational GC, Young GC averages 25 ms and Full GC 1.8 s, causing a 23‑hour runtime for a billion‑request job; G1 mixed GC averages 42 ms, while ZGC’s max pause stays under 7.2 ms when large pages are enabled. In high‑throughput low‑latency services (e.g., financial gateways), ZGC + GraalVM reduces GC overhead to 0.016 % and achieves sub‑15 ms P99 latency. For serverless workloads, GraalVM Native Image cuts cold‑start from 3.2 s to 23 ms, saving 60‑86 % memory for small‑heap services.
The article concludes with a concise answer template for interviewers: assess the service’s SLA (P99 latency, Full GC frequency, pause spikes) before naming a GC strategy, and tailor the configuration to the workload tier—ZGC for ultra‑low latency, G1 for balanced micro‑services, Parallel GC for batch jobs.
Tech Freedom Circle
Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
