Mastering Java Microbenchmarking with JMH: 12 Common Pitfalls and How to Avoid Them
This article introduces Java Microbenchmark Harness (JMH), explains why precise benchmarking matters, and walks through twelve typical testing pitfalls—such as dead‑code elimination, constant folding, loop misuse, fork isolation, method inlining, false sharing, branch prediction, and multithreading—showing how JMH helps developers obtain reliable performance measurements.
Preface
JMH (http://openjdk.java.net/projects/code-tools/jmh/) is the Java Microbenchmark Harness framework, first released in 2013. It was developed by the same people who implement the JIT in Oracle. The author especially mentions Aleksey Shipilev, the author and evangelist of JMH, and his excellent blog posts.
The author spent a weekend reading Aleksey's blog, especially the JMH‑related articles, and a public lecture video "The Lesser of Two Evils" story, summarizing the gains in this article. Many pictures are from Aleksey's video.
Before Reading This Article
This article does not spend dedicated space on JMH syntax; if you have used JMH, great, otherwise no worries. The author will discuss common testing traps from a Java developer perspective, analyze their relation to OS and Java internals, and use JMH to help avoid them.
Reading the article requires some OS knowledge and basic JIT concepts; unfamiliar points are linked to Wikipedia and recommended blogs.
The author acknowledges limited ability and welcomes comments on errors or omissions.
Getting Started with JMH
Test Precision
The above figure shows the time magnitude of different test types; JMH can achieve microsecond‑level precision. Different magnitude tests face different challenges.
Millisecond‑level tests are not difficult.
Microsecond‑level tests are challenging but achievable; JMH does it.
Nanosecond‑level tests cannot be measured accurately yet.
Picosecond‑level… Holy Shit.
Diagram:
Linpack: a basic benchmark measuring floating‑point performance. SPEC: industry standard performance evaluation organization. Pipelining: bus communication latency.
Benchmark Classification
Benchmarks can be classified in many dimensions: integration tests, unit tests, API tests, stress tests… Benchmark usually means performance testing. Many open‑source frameworks expose benchmark packages to quantify their performance.
Benchmarks can be further divided into Micro benchmark, Kernels, Synthetic benchmark, Application benchmarks, etc. The subject of this article belongs to Micro benchmark.
Detailed classification can be found here (link).
Why Benchmark is Needed
If you cannot measure it, you cannot improve it. --Lord Kelvin
Benchmarks provide data support for applications, allowing objective comparison of methods. Accuracy and diversity of benchmarks are crucial. JMH’s author Aleksey is also a member of SPEC.
What JMH Looks Like
@Benchmark
public void measure() {
// this method was intentionally left blank.
}Using JMH is as simple as unit testing.
Its result:
Benchmark Mode Cnt Score Error Units
JMHSample_HelloWorld.measure thrpt 5 3126699413.430 ± 179167212.838 ops/sWhy JMH Testing is Needed
You might wonder why not test with the following simple code?
long start = System.currentTimeMillis();
measure();
System.out.println(System.currentTimeMillis() - start);This is the core problem of the article: many naive testing approaches suffer from hidden traps that JMH helps to avoid.
Using JMH to Solve 12 Testing Pitfalls
Trap 1: Dead Code Elimination
When measuring Math.log, a method without a return yields the same result as a baseline empty method because the JIT removes dead code. Using JMH’s Blackhole API prevents this.
@Benchmark
public void measureRight(Blackhole bh) {
bh.consume(Math.log(Math.PI));
}Trap 2: Constant Folding and Propagation
Constant folding simplifies constant expressions at compile time, which can affect benchmark results. Example code shows how constant folding changes results.
private double x = Math.PI;
private final double wrongX = Math.PI;
@Benchmark
public double baseline() { return Math.PI; }
@Benchmark
public double measureWrong_1() { return Math.log(Math.PI); }
@Benchmark
public double measureRight() { return Math.log(x); }Only the last method correctly measures Math.log performance.
Trap 3: Never Write Loops in Benchmarks
Loop unrolling, JIT & OSR optimizations affect loop performance. Using JMH avoids manual loops.
public class BadMicrobenchmark {
public static void main(String[] args) {
long startTime = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
reps();
}
long endTime = System.nanoTime();
System.out.println("ns/op : " + (endTime - startTime));
}
}JMH provides accurate measurements without such loops.
Trap 4: Using Fork to Isolate Benchmarks
Running multiple benchmarks in the same JVM can cause interference. Fork(0) runs them in the same JVM, showing inconsistent results, while Fork>0 isolates each benchmark.
@Benchmark
@Fork(0)
public int measure_1_c1() { return measure(c1); }
@Benchmark
@Fork(1)
public int measure_4_forked_c1() { return measure(c1); }Results demonstrate the need for proper forking.
Trap 5: Method Inlining
JIT may inline hot methods, affecting benchmark results. Use -XX:+PrintInlining to see inlining, and control it with @CompilerControl.
@Benchmark
public void target_blank() { }
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void target_dontInline() { }
@CompilerControl(CompilerControl.Mode.INLINE)
public void target_inline() { }Measurements show significant differences between inlined and non‑inlined methods.
Trap 6: False Sharing and Cache Lines
Cache‑line effects can distort results; JMH offers @State and @Contended to mitigate false sharing.
Trap 7: Branch Prediction
Processing a sorted array is faster than an unsorted one due to branch prediction.
@Benchmark
@OperationsPerInvocation(COUNT)
public void sorted(Blackhole bh1, Blackhole bh2) {
for (byte v : sorted) {
if (v > 0) bh1.consume(v);
else bh2.consume(v);
}
}Results: sorted ~2.7 ns/op, unsorted ~8.1 ns/op.
Trap 8: Multithreaded Testing
Power management and OS scheduling affect multithreaded benchmark scaling. Disabling power management and using proper forking improve consistency.
Conclusion
The article explains the importance of JMH and enumerates many pitfalls that can invalidate naive benchmarks. Understanding these issues helps Java developers obtain reliable performance measurements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
