Unlock JVM Performance: How Tiered Compilation and JIT Optimizations Work
This article explores the JVM's tiered compilation mechanism, detailing how the interpreter, C1 and C2 compilers, code cache segmentation, and various JIT optimizations such as inlining, loop unrolling, escape analysis, and safepoint handling affect startup speed and runtime performance, with practical parameter tuning tips.
Overview
The article examines Java Virtual Machine (JVM) tiered compilation and its impact on program performance. It explains how Java source is first compiled to bytecode (.class) and then executed either by the interpreter or Just‑In‑Time (JIT) compilers, depending on hotspot detection.
Code Cache and Segments
JVM allocates a Code Cache for compiled machine code. The initial size is 2496KB, up to 240MB, configurable via -XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize. Since Java 9 the cache is divided into three regions:
non‑method segment (JVM internal code) – -XX:NonNMethodCodeHeapSize profiled‑code segment (C1 compiled code) – -XX:ProfiledCodeHeapSize non‑profiled segment (C2 compiled code) – -XX:NonProfiledCodeHeapSize Dividing the cache reduces fragmentation and improves efficiency.
Tiered Compilation Levels
Interpreter – pure interpretation.
C1 without profiling.
C1 with limited profiling.
C1 with full profiling.
C2 – optimized server compiler (including Graal from JDK 9).
Typical execution starts at level 0, moves to level 3, and finally to level 4. Certain hot methods may skip levels based on size, profiling cost, or compiler availability.
Performance Comparison
Using Spring PetClinic as a benchmark, C2 compiled 863 methods in 19.6 s, while C1 compiled 5 254 methods in 2.1 s, showing C1’s superior compilation speed but lower runtime performance. Throughput measurements on a servlet application demonstrate that longer warm‑up (more profiling) leads to higher throughput, and C2‑optimized code eventually outperforms C1.
Compilation Optimizations
Method Inlining
Getter/setter calls are expensive due to stack‑frame creation. The JIT can inline such methods, turning:
public class Point { private int x, y; public int getX() { return x; } public void setX(int i) { x = i; } }into: Point p = getPoint(); p.x = p.x * 2; Disabling inlining with -XX:-Inline can degrade performance by over 50 %.
Loop Unrolling
To reduce branch misprediction costs, the JIT may unroll loops. Example:
for (int i = 0; i < MAX; i++) { data[i] = random.nextLong(); }may become:
for (int i = 0; i < MAX; i += 5) { data[i] = random.nextLong(); data[i+1] = random.nextLong(); data[i+2] = random.nextLong(); data[i+3] = random.nextLong(); data[i+4] = random.nextLong(); }Using a long loop counter prevents this optimization because the JVM lacks a long‑counter unroll implementation.
Safepoint Insertion
Safepoints allow the JVM to pause all threads for GC or other stop‑the‑world operations. For counted loops with int counters, the JIT removes forward‑jump safepoints to improve speed, which can cause a single long loop to delay other threads. The option -XX:+UseCountedLoopSafepoints forces periodic safepoints, while -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000 logs long‑lasting safepoint waits.
Loop Strip Mining (Java 10)
Strip mining partitions a long loop into chunks, inserting a safepoint after a runtime‑determined number of iterations to balance compilation time and pause latency.
int next = Math.min(stop, i + LoopStripMiningIter * stride); do { /* body */ i += stride; } while (i < next); safepoint();Range‑Check Elimination
The JIT splits loops into pre‑loop, main loop, and post‑loop phases, eliminating array bounds checks in the main loop when it can prove safety.
for (int i = start; i < limit; i++) { array[i] = 0; }becomes three loops where the middle one runs without checks.
Loop Unswitching
Conditionals that are loop‑invariant are moved outside the loop, reducing repeated evaluations.
if (x) { for (int i=0;i<N;i++) a[i]=0; } else { for (int i=0;i<N;i++) b[i]=0; }Escape Analysis and Scalar Replacement
Escape analysis determines whether an object escapes the current method. Non‑escaping objects can be scalar‑replaced, allocating fields directly on the stack instead of the heap.
public void foo() { MyObject o = new MyObject(); o.x = 1; }optimizes to: int x = 1; Disabling this with -XX:-EliminateLocks shows dramatic performance differences for synchronized code.
Peephole Optimization
Local instruction patterns are simplified, e.g., replacing repeated loads with a dup instruction, or using shift‑add tricks for multiplication.
y = x * 3 // becomes y = (x << 1) + xDead Code Elimination
Unreachable or unused code is removed, shrinking the method body.
int dead() { int a=10; int z=50; int c=z*5; a=20; a=a*10; return c; }optimizes to:
int dead() { int z=50; int c=z*5; return c; }References
Java JIT compiler analysis and practice – Meituan Tech
Deep dive into JVM JIT – Liangliang Lee
Startup, containers & Tiered Compilation – JPBempel
Chapter 4. Working with the JIT Compiler – O'Reilly
Tiered Compilation in JVM – Baeldung
Loop Unrolling – Oracle Java Magazine
Optimize loops with long variables – Red Hat
JVM safepoint analysis – Jianshu
Escape analysis examples – Java Advent
Scalar Replacement – Shipilev
Lock Elision – Shipilev
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
