Backend Development 21 min read

Boost Java Microservice Performance with GraalVM Native Image: A Step-by-Step Guide

This tutorial demonstrates how to create a Micronaut microservice, compile it into a GraalVM Native Image, and apply advanced optimizations such as G1 GC, profile‑guided compilation, and UPX packing to achieve faster startup, lower memory usage, and higher throughput in cloud environments.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
Boost Java Microservice Performance with GraalVM Native Image: A Step-by-Step Guide

1. Introduction

GraalVM Native Image can be an attractive platform for Java cloud applications. As I wrote in "GraalVM: Native images in containers", native images pre‑compile your application (AOT), eliminating the need for runtime compilation, so the app starts almost instantly and uses less memory, saving resources used by the JIT compiler and class metadata.

Beyond fast startup, developers use native images for cloud‑friendliness and code obfuscation to improve security.

Figure 1 often appears when discussing performance and the different ways GraalVM can run Java applications; it shows many axes labeled with what people mean by “better performance”.

Sometimes better performance means higher throughput (how many clients a service instance can handle); sometimes it means lower latency for a single response, lower memory usage, faster startup, or smaller deployment size, which can matter for cold‑start scenarios.

With a few simple tricks and advanced GraalVM Native Image features, you can exploit all these advantages for your application.

This article shows how to fully leverage GraalVM Native Image for your application.

2. Create an Application

Assume you have a simple example app: a Micronaut microservice that responds to HTTP queries and computes prime numbers. It uses Java Stream API and creates temporary objects that generate GC pressure while inefficiently checking factors, including even numbers greater than 2.

If you have the Micronaut CLI installed, you can create the app as follows.

<code>mn create-app org.shelajev.primes
cd primes
cat <<'EOF' > src/main/java/org/shelajev/PrimesController.java
package org.shelajev;
import io.micronaut.http.annotation.Controller;
import io.micronaut.http.annotation.*;
import java.util.stream.*;
import java.util.*;
@Controller("/primes")
public class PrimesController {
    private Random r = new Random();

    @Get("/random/{upperbound}")
    public List<Long> random(int upperbound) {
        int to = 2 + r.nextInt(upperbound - 2);
        int from = 1 + r.nextInt(to - 1);
        return primes(from, to);
    }
    public static boolean isPrime(long n) {
        return LongStream.rangeClosed(2, (long) Math.sqrt(n))
            .allMatch(i -> n % i != 0);
    }
    public static List<Long> primes(long min, long max) {
        return LongStream.range(min, max)
            .filter(PrimesController::isPrime)
            .boxed()
            .collect(Collectors.toList());
    }
}
EOF</code>

Now you have the sample app. You can run it or immediately build a native executable.

<code>./gradlew build
./gradlew nativeImage</code>

Then run the application.

<code>java -jar build/libs/primes-0.1-all.jar
./build/native-image/application</code>

To test, you can open the endpoint in a browser or use the

curl

command, which returns a prime less than 100.

<code>curl http://localhost:8080/primes/random/100</code>

For later stages, download and install

hey

, a simple HTTP load generator, and place it in your

$PATH

(or obtain the appropriate binary for your OS).

<code>wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64
chmod u+x hey_linux_amd64
sudo mv hey_linux_amd64 /usr/local/bin/hey
hey –version</code>

Verify it works:

<code>hey -z 15s http://localhost:8080/primes/random/100</code>

The output includes a latency distribution and a summary such as:

<code>Summary:
  Total:    15.0021 secs
  Slowest:  0.1064 secs
  Fastest:  0.0001 secs
  Average:  0.0015 secs
  Requests/sec: 33703.8539
  Total data:   20062978 bytes
  Size/request: 20 bytes</code>

The most important metric is the

Requests/sec

line, showing throughput. The native image defaults to

-Xmx

set to 80 % of available memory; for this test you may want to limit the heap to 512 MB instead of letting it grow indefinitely.

3. Better Memory Management

Reducing runtime memory usage is a key metric, and Native Image improves this compared with a generic JDK.

The savings are mostly one‑time because the native executable contains all compiled code and analyzed classes, eliminating class metadata and JIT infrastructure.

However, the amount of data your application holds in memory is similar, because object layout in the JVM and native image is alike. If your app keeps several gigabytes of data, the native image will use a comparable amount, minus the 200‑300 MB saved by not having JIT and metadata.

Native Image includes a runtime that assumes managed memory and performs garbage collection. The runtime implementation comes from the GraalVM project.

The garbage collector exposes the same options as the JDK, such as

-Xmx

for maximum heap size and

-Xmn

for young generation size. You can also enable

-XX:+PrintGC

and

-XX:+VerboseGC

for detailed GC logs.

If you prefer a different collector, you can build the native image with the multithreaded G1 GC, which is a performance‑oriented feature included in GraalVM Enterprise. Enable it by passing

--gc=G1

to the native‑image process, e.g., in

build.gradle

:

<code>nativeImage {
  args("--gc=G1")
}</code>

Rebuild the native image after adding the argument.

4. Better Overall Throughput

Throughput is affected by workload characteristics, code quality, data volume, and latency. A better runtime or compiler can significantly speed execution.

GraalVM Enterprise ships with a more powerful compiler that can generate a profile‑guided optimization (PGO) file during AOT compilation, bringing native image throughput closer to a warmed‑up JIT.

To collect a PGO profile, enable

--pgo-instrument

in

build.gradle

and build the image normally:

<code>nativeImage {
  args("--gc=G1")
  args("--pgo-instrument")
}</code>

Run the load generator against the instrumented binary; it will produce a

default.iprof

file.

Then rebuild the final image using the profile:

<code>nativeImage {
  args("--gc=G1")
  args("--pgo=../../default.iprof")
}</code>

The resulting binary (named

app-ee-pgo

) can be compared with other builds.

5. Smaller Binaries

Binary size can be large; you can shrink it.

Without any size optimizations, the example binaries are:

<code>$ ls -lah app*
-rwxrwxr-x. 1 opc opc 58M May 6 20:41 app-ce
-rwxrwxr-x. 1 opc opc 73M May 6 21:14 app-ee
-rwxrwxr-x. 1 opc opc 99M May 6 21:25 app-ee-g1
-rwxrwxr-x. 1 opc opc 80M May 6 21:47 app-ee-pgo
</code>

The executable consists of two main parts: the compiled code and the “image heap” created during class initialization.

The code part contains all classes and methods reachable by static analysis or explicit configuration.

The image heap stores the initialized state so that the native image can start instantly.

You can inspect class contributions with the

-H:+DashboardAll

option and refactor accordingly.

Compressing the binary with UPX (e.g.,

upx -7 -k app-ee-pgo

) reduces size from ~80 MB to ~23 MB while preserving performance.

6. How Far Can Native Image Take You?

The article demonstrated several optimizations: adaptive G1 GC, profile‑guided compilation, and UPX packing, resulting in a microservice that starts in ~20 ms, occupies ~20 MB, and outperforms OpenJDK on the first 1 M requests.

Running the three 15‑second tests with a 512 MB heap limit yields:

Default native image (app‑ee): 49 791 req/s

With G1 GC (app‑ee‑g1): 51 691 req/s

With G1 + PGO (app‑ee‑pgo): 73 392 req/s

Compared with the same application on OpenJDK 11, the best native image is about 16 % faster.

Overall, native images can match JIT‑based performance while offering faster startup and smaller footprints, making them suitable for constrained environments or microservices.

7. Conclusion

This guide presented various ways to improve native image performance without changing application code: using G1 GC, enabling profile‑guided optimization, and packing with UPX. The resulting microservice starts in ~20 ms, occupies ~20 MB, and delivers higher throughput than the equivalent OpenJDK build.

GraalVM Native Image is an exciting technology for Java workloads in cloud environments, and the techniques shown help you use it more effectively.

Translator’s Note

Hi, I’m Spring‑bro (Mica) and thanks to Zhang Yadong (JustAuth) for helping translate. We have translated several GraalVM and Spring Native articles.

Javaperformance optimizationGraalVMnative-imageMicroserviceG1 GCUPX
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.