Backend Development 21 min read

Boost Java Microservice Performance with GraalVM Native Image: A Step-by-Step Guide

This tutorial demonstrates how to create a Micronaut microservice, compile it into a GraalVM Native Image, and apply advanced optimizations such as G1 GC, profile‑guided compilation, and UPX packing to achieve faster startup, lower memory usage, and higher throughput in cloud environments.

Java Architecture Diary

Jul 9, 2021

Boost Java Microservice Performance with GraalVM Native Image: A Step-by-Step Guide

1. Introduction

GraalVM Native Image can be an attractive platform for Java cloud applications. As I wrote in "GraalVM: Native images in containers", native images pre‑compile your application (AOT), eliminating the need for runtime compilation, so the app starts almost instantly and uses less memory, saving resources used by the JIT compiler and class metadata.

Beyond fast startup, developers use native images for cloud‑friendliness and code obfuscation to improve security.

Figure 1 often appears when discussing performance and the different ways GraalVM can run Java applications; it shows many axes labeled with what people mean by “better performance”.

Sometimes better performance means higher throughput (how many clients a service instance can handle); sometimes it means lower latency for a single response, lower memory usage, faster startup, or smaller deployment size, which can matter for cold‑start scenarios.

With a few simple tricks and advanced GraalVM Native Image features, you can exploit all these advantages for your application.

This article shows how to fully leverage GraalVM Native Image for your application.

2. Create an Application

Assume you have a simple example app: a Micronaut microservice that responds to HTTP queries and computes prime numbers. It uses Java Stream API and creates temporary objects that generate GC pressure while inefficiently checking factors, including even numbers greater than 2.

If you have the Micronaut CLI installed, you can create the app as follows.

mn create-app org.shelajev.primes
cd primes
cat <<'EOF' > src/main/java/org/shelajev/PrimesController.java
package org.shelajev;
import io.micronaut.http.annotation.Controller;
import io.micronaut.http.annotation.*;
import java.util.stream.*;
import java.util.*;
@Controller("/primes")
public class PrimesController {
    private Random r = new Random();

    @Get("/random/{upperbound}")
    public List<Long> random(int upperbound) {
        int to = 2 + r.nextInt(upperbound - 2);
        int from = 1 + r.nextInt(to - 1);
        return primes(from, to);
    }
    public static boolean isPrime(long n) {
        return LongStream.rangeClosed(2, (long) Math.sqrt(n))
            .allMatch(i -> n % i != 0);
    }
    public static List<Long> primes(long min, long max) {
        return LongStream.range(min, max)
            .filter(PrimesController::isPrime)
            .boxed()
            .collect(Collectors.toList());
    }
}
EOF

Now you have the sample app. You can run it or immediately build a native executable.

./gradlew build
./gradlew nativeImage

Then run the application.

java -jar build/libs/primes-0.1-all.jar
./build/native-image/application

To test, you can open the endpoint in a browser or use the curl command, which returns a prime less than 100. curl http://localhost:8080/primes/random/100 For later stages, download and install hey, a simple HTTP load generator, and place it in your $PATH (or obtain the appropriate binary for your OS).

wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64
chmod u+x hey_linux_amd64
sudo mv hey_linux_amd64 /usr/local/bin/hey
hey –version

Verify it works:

hey -z 15s http://localhost:8080/primes/random/100

The output includes a latency distribution and a summary such as:

Summary:
  Total:    15.0021 secs
  Slowest:  0.1064 secs
  Fastest:  0.0001 secs
  Average:  0.0015 secs
  Requests/sec: 33703.8539
  Total data:   20062978 bytes
  Size/request: 20 bytes

The most important metric is the Requests/sec line, showing throughput. The native image defaults to -Xmx set to 80 % of available memory; for this test you may want to limit the heap to 512 MB instead of letting it grow indefinitely.

3. Better Memory Management

Reducing runtime memory usage is a key metric, and Native Image improves this compared with a generic JDK.

The savings are mostly one‑time because the native executable contains all compiled code and analyzed classes, eliminating class metadata and JIT infrastructure.

However, the amount of data your application holds in memory is similar, because object layout in the JVM and native image is alike. If your app keeps several gigabytes of data, the native image will use a comparable amount, minus the 200‑300 MB saved by not having JIT and metadata.

Native Image includes a runtime that assumes managed memory and performs garbage collection. The runtime implementation comes from the GraalVM project.

The garbage collector exposes the same options as the JDK, such as -Xmx for maximum heap size and -Xmn for young generation size. You can also enable -XX:+PrintGC and -XX:+VerboseGC for detailed GC logs.

If you prefer a different collector, you can build the native image with the multithreaded G1 GC, which is a performance‑oriented feature included in GraalVM Enterprise. Enable it by passing --gc=G1 to the native‑image process, e.g., in build.gradle:

nativeImage {
  args("--gc=G1")
}

Rebuild the native image after adding the argument.

4. Better Overall Throughput

Throughput is affected by workload characteristics, code quality, data volume, and latency. A better runtime or compiler can significantly speed execution.

GraalVM Enterprise ships with a more powerful compiler that can generate a profile‑guided optimization (PGO) file during AOT compilation, bringing native image throughput closer to a warmed‑up JIT.

To collect a PGO profile, enable --pgo-instrument in build.gradle and build the image normally:

nativeImage {
  args("--gc=G1")
  args("--pgo-instrument")
}

Run the load generator against the instrumented binary; it will produce a default.iprof file.

Then rebuild the final image using the profile:

nativeImage {
  args("--gc=G1")
  args("--pgo=../../default.iprof")
}

The resulting binary (named app-ee-pgo) can be compared with other builds.

5. Smaller Binaries

Binary size can be large; you can shrink it.

Without any size optimizations, the example binaries are:

$ ls -lah app*
-rwxrwxr-x. 1 opc opc 58M May 6 20:41 app-ce
-rwxrwxr-x. 1 opc opc 73M May 6 21:14 app-ee
-rwxrwxr-x. 1 opc opc 99M May 6 21:25 app-ee-g1
-rwxrwxr-x. 1 opc opc 80M May 6 21:47 app-ee-pgo

The executable consists of two main parts: the compiled code and the “image heap” created during class initialization.

The code part contains all classes and methods reachable by static analysis or explicit configuration.

The image heap stores the initialized state so that the native image can start instantly.

You can inspect class contributions with the -H:+DashboardAll option and refactor accordingly.

Compressing the binary with UPX (e.g., upx -7 -k app-ee-pgo) reduces size from ~80 MB to ~23 MB while preserving performance.

6. How Far Can Native Image Take You?

The article demonstrated several optimizations: adaptive G1 GC, profile‑guided compilation, and UPX packing, resulting in a microservice that starts in ~20 ms, occupies ~20 MB, and outperforms OpenJDK on the first 1 M requests.

Running the three 15‑second tests with a 512 MB heap limit yields:

Default native image (app‑ee): 49 791 req/s

With G1 GC (app‑ee‑g1): 51 691 req/s

With G1 + PGO (app‑ee‑pgo): 73 392 req/s

Compared with the same application on OpenJDK 11, the best native image is about 16 % faster.

Overall, native images can match JIT‑based performance while offering faster startup and smaller footprints, making them suitable for constrained environments or microservices.

7. Conclusion

This guide presented various ways to improve native image performance without changing application code: using G1 GC, enabling profile‑guided optimization, and packing with UPX. The resulting microservice starts in ~20 ms, occupies ~20 MB, and delivers higher throughput than the equivalent OpenJDK build.

GraalVM Native Image is an exciting technology for Java workloads in cloud environments, and the techniques shown help you use it more effectively.

Translator’s Note

Hi, I’m Spring‑bro (Mica) and thanks to Zhang Yadong (JustAuth) for helping translate. We have translated several GraalVM and Spring Native articles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization GraalVM Native Image microservice G1 GC upx

Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.