Mastering Java Microbenchmarking with JMH: 12 Common Pitfalls and How to Avoid Them

This article introduces Java Microbenchmark Harness (JMH), explains why precise benchmarking matters, and walks through twelve typical testing pitfalls—such as dead‑code elimination, constant folding, loop misuse, fork isolation, method inlining, false sharing, branch prediction, and multithreading—showing how JMH helps developers obtain reliable performance measurements.

Programmer DD
Programmer DD
Programmer DD
Mastering Java Microbenchmarking with JMH: 12 Common Pitfalls and How to Avoid Them

Preface

JMH (http://openjdk.java.net/projects/code-tools/jmh/) is the Java Microbenchmark Harness framework, first released in 2013. It was developed by the same people who implement the JIT in Oracle. The author especially mentions Aleksey Shipilev, the author and evangelist of JMH, and his excellent blog posts.

The author spent a weekend reading Aleksey's blog, especially the JMH‑related articles, and a public lecture video "The Lesser of Two Evils" story, summarizing the gains in this article. Many pictures are from Aleksey's video.

Before Reading This Article

This article does not spend dedicated space on JMH syntax; if you have used JMH, great, otherwise no worries. The author will discuss common testing traps from a Java developer perspective, analyze their relation to OS and Java internals, and use JMH to help avoid them.

Reading the article requires some OS knowledge and basic JIT concepts; unfamiliar points are linked to Wikipedia and recommended blogs.

The author acknowledges limited ability and welcomes comments on errors or omissions.

Getting Started with JMH

Test Precision

Test Precision
Test Precision

The above figure shows the time magnitude of different test types; JMH can achieve microsecond‑level precision. Different magnitude tests face different challenges.

Millisecond‑level tests are not difficult.

Microsecond‑level tests are challenging but achievable; JMH does it.

Nanosecond‑level tests cannot be measured accurately yet.

Picosecond‑level… Holy Shit.

Diagram:

Linpack: a basic benchmark measuring floating‑point performance. SPEC: industry standard performance evaluation organization. Pipelining: bus communication latency.

Benchmark Classification

Benchmarks can be classified in many dimensions: integration tests, unit tests, API tests, stress tests… Benchmark usually means performance testing. Many open‑source frameworks expose benchmark packages to quantify their performance.

Benchmarks can be further divided into Micro benchmark, Kernels, Synthetic benchmark, Application benchmarks, etc. The subject of this article belongs to Micro benchmark.

Detailed classification can be found here (link).

Benchmark in Motan
Benchmark in Motan

Why Benchmark is Needed

If you cannot measure it, you cannot improve it. --Lord Kelvin

Benchmarks provide data support for applications, allowing objective comparison of methods. Accuracy and diversity of benchmarks are crucial. JMH’s author Aleksey is also a member of SPEC.

What JMH Looks Like

@Benchmark
public void measure() {
    // this method was intentionally left blank.
}

Using JMH is as simple as unit testing.

Its result:

Benchmark                     Mode  Cnt          Score          Error  Units
JMHSample_HelloWorld.measure  thrpt    5  3126699413.430 ± 179167212.838  ops/s

Why JMH Testing is Needed

You might wonder why not test with the following simple code?

long start = System.currentTimeMillis();
measure();
System.out.println(System.currentTimeMillis() - start);

This is the core problem of the article: many naive testing approaches suffer from hidden traps that JMH helps to avoid.

Using JMH to Solve 12 Testing Pitfalls

Trap 1: Dead Code Elimination

Dead Code Elimination
Dead Code Elimination

When measuring Math.log, a method without a return yields the same result as a baseline empty method because the JIT removes dead code. Using JMH’s Blackhole API prevents this.

@Benchmark
public void measureRight(Blackhole bh) {
    bh.consume(Math.log(Math.PI));
}

Trap 2: Constant Folding and Propagation

Constant folding simplifies constant expressions at compile time, which can affect benchmark results. Example code shows how constant folding changes results.

private double x = Math.PI;
private final double wrongX = Math.PI;

@Benchmark
public double baseline() { return Math.PI; }

@Benchmark
public double measureWrong_1() { return Math.log(Math.PI); }

@Benchmark
public double measureRight() { return Math.log(x); }

Only the last method correctly measures Math.log performance.

Trap 3: Never Write Loops in Benchmarks

Loop unrolling, JIT & OSR optimizations affect loop performance. Using JMH avoids manual loops.

public class BadMicrobenchmark {
    public static void main(String[] args) {
        long startTime = System.nanoTime();
        for (int i = 0; i < 10_000_000; i++) {
            reps();
        }
        long endTime = System.nanoTime();
        System.out.println("ns/op : " + (endTime - startTime));
    }
}

JMH provides accurate measurements without such loops.

Trap 4: Using Fork to Isolate Benchmarks

Running multiple benchmarks in the same JVM can cause interference. Fork(0) runs them in the same JVM, showing inconsistent results, while Fork>0 isolates each benchmark.

@Benchmark
@Fork(0)
public int measure_1_c1() { return measure(c1); }

@Benchmark
@Fork(1)
public int measure_4_forked_c1() { return measure(c1); }

Results demonstrate the need for proper forking.

Trap 5: Method Inlining

JIT may inline hot methods, affecting benchmark results. Use -XX:+PrintInlining to see inlining, and control it with @CompilerControl.

@Benchmark
public void target_blank() { }

@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void target_dontInline() { }

@CompilerControl(CompilerControl.Mode.INLINE)
public void target_inline() { }

Measurements show significant differences between inlined and non‑inlined methods.

Trap 6: False Sharing and Cache Lines

Cache‑line effects can distort results; JMH offers @State and @Contended to mitigate false sharing.

Trap 7: Branch Prediction

Processing a sorted array is faster than an unsorted one due to branch prediction.

@Benchmark
@OperationsPerInvocation(COUNT)
public void sorted(Blackhole bh1, Blackhole bh2) {
    for (byte v : sorted) {
        if (v > 0) bh1.consume(v);
        else bh2.consume(v);
    }
}

Results: sorted ~2.7 ns/op, unsorted ~8.1 ns/op.

Trap 8: Multithreaded Testing

Power management and OS scheduling affect multithreaded benchmark scaling. Disabling power management and using proper forking improve consistency.

Conclusion

The article explains the importance of JMH and enumerates many pitfalls that can invalidate naive benchmarks. Understanding these issues helps Java developers obtain reliable performance measurements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance TestingJITBenchmarkmicrobenchmarkJMH
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.