Fundamentals 29 min read

Unlock JVM Performance: How Tiered Compilation and JIT Optimizations Work

This article explores the JVM's tiered compilation mechanism, detailing how the interpreter, C1 and C2 compilers, code cache segmentation, and various JIT optimizations such as inlining, loop unrolling, escape analysis, and safepoint handling affect startup speed and runtime performance, with practical parameter tuning tips.

Alibaba Cloud Developer

Jul 18, 2024

Unlock JVM Performance: How Tiered Compilation and JIT Optimizations Work

Overview

The article examines Java Virtual Machine (JVM) tiered compilation and its impact on program performance. It explains how Java source is first compiled to bytecode (.class) and then executed either by the interpreter or Just‑In‑Time (JIT) compilers, depending on hotspot detection.

Code Cache and Segments

JVM allocates a Code Cache for compiled machine code. The initial size is 2496KB, up to 240MB, configurable via -XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize. Since Java 9 the cache is divided into three regions:

non‑method segment (JVM internal code) – -XX:NonNMethodCodeHeapSize profiled‑code segment (C1 compiled code) – -XX:ProfiledCodeHeapSize non‑profiled segment (C2 compiled code) – -XX:NonProfiledCodeHeapSize Dividing the cache reduces fragmentation and improves efficiency.

Tiered Compilation Levels

Interpreter – pure interpretation.

C1 without profiling.

C1 with limited profiling.

C1 with full profiling.

C2 – optimized server compiler (including Graal from JDK 9).

Typical execution starts at level 0, moves to level 3, and finally to level 4. Certain hot methods may skip levels based on size, profiling cost, or compiler availability.

Performance Comparison

Using Spring PetClinic as a benchmark, C2 compiled 863 methods in 19.6 s, while C1 compiled 5 254 methods in 2.1 s, showing C1’s superior compilation speed but lower runtime performance. Throughput measurements on a servlet application demonstrate that longer warm‑up (more profiling) leads to higher throughput, and C2‑optimized code eventually outperforms C1.

Compilation Optimizations

Method Inlining

Getter/setter calls are expensive due to stack‑frame creation. The JIT can inline such methods, turning:

public class Point { private int x, y; public int getX() { return x; } public void setX(int i) { x = i; } }

into: Point p = getPoint(); p.x = p.x * 2; Disabling inlining with -XX:-Inline can degrade performance by over 50 %.

Loop Unrolling

To reduce branch misprediction costs, the JIT may unroll loops. Example:

for (int i = 0; i < MAX; i++) { data[i] = random.nextLong(); }

may become:

for (int i = 0; i < MAX; i += 5) { data[i] = random.nextLong(); data[i+1] = random.nextLong(); data[i+2] = random.nextLong(); data[i+3] = random.nextLong(); data[i+4] = random.nextLong(); }

Using a long loop counter prevents this optimization because the JVM lacks a long‑counter unroll implementation.

Safepoint Insertion

Safepoints allow the JVM to pause all threads for GC or other stop‑the‑world operations. For counted loops with int counters, the JIT removes forward‑jump safepoints to improve speed, which can cause a single long loop to delay other threads. The option -XX:+UseCountedLoopSafepoints forces periodic safepoints, while -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000 logs long‑lasting safepoint waits.

Loop Strip Mining (Java 10)

Strip mining partitions a long loop into chunks, inserting a safepoint after a runtime‑determined number of iterations to balance compilation time and pause latency.

int next = Math.min(stop, i + LoopStripMiningIter * stride); do { /* body */ i += stride; } while (i < next); safepoint();

Range‑Check Elimination

The JIT splits loops into pre‑loop, main loop, and post‑loop phases, eliminating array bounds checks in the main loop when it can prove safety.

for (int i = start; i < limit; i++) { array[i] = 0; }

becomes three loops where the middle one runs without checks.

Loop Unswitching

Conditionals that are loop‑invariant are moved outside the loop, reducing repeated evaluations.

if (x) { for (int i=0;i<N;i++) a[i]=0; } else { for (int i=0;i<N;i++) b[i]=0; }

Escape Analysis and Scalar Replacement

Escape analysis determines whether an object escapes the current method. Non‑escaping objects can be scalar‑replaced, allocating fields directly on the stack instead of the heap.

public void foo() { MyObject o = new MyObject(); o.x = 1; }

optimizes to: int x = 1; Disabling this with -XX:-EliminateLocks shows dramatic performance differences for synchronized code.

Peephole Optimization

Local instruction patterns are simplified, e.g., replacing repeated loads with a dup instruction, or using shift‑add tricks for multiplication.

y = x * 3  // becomes y = (x << 1) + x

Dead Code Elimination

Unreachable or unused code is removed, shrinking the method body.

int dead() { int a=10; int z=50; int c=z*5; a=20; a=a*10; return c; }

optimizes to:

int dead() { int z=50; int c=z*5; return c; }

References

Java JIT compiler analysis and practice – Meituan Tech

Deep dive into JVM JIT – Liangliang Lee

Startup, containers & Tiered Compilation – JPBempel

Chapter 4. Working with the JIT Compiler – O'Reilly

Tiered Compilation in JVM – Baeldung

Loop Unrolling – Oracle Java Magazine

Optimize loops with long variables – Red Hat

JVM safepoint analysis – Jianshu

Escape analysis examples – Java Advent

Scalar Replacement – Shipilev

Lock Elision – Shipilev

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JVM Java performance Escape Analysis Tiered Compilation JIT Optimization

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.