Cloud Native 10 min read

How We Fully Squeezed Docker Performance: A Complete Record of Optimizing 40+ Spring Boot Services

After migrating over 40 Spring Boot microservices to Docker and Kubernetes, the authors encountered slow startups, OOM kills, unexpected latency, and pod restarts, and they detail a step‑by‑step analysis and concrete Dockerfile, JVM, CPU, GC, and Kubernetes configurations that turned the services into fast, stable, and observable production workloads.

LuTiao Programming

Dec 16, 2025

How We Fully Squeezed Docker Performance: A Complete Record of Optimizing 40+ Spring Boot Services

When more than 40 Spring Boot services were moved from bare‑metal servers to Docker + Kubernetes, a series of puzzling problems appeared: noticeably slower startup, occasional OOMKilled, JVM memory usage far exceeding container limits, random high latency despite idle CPU, and pods repeatedly restarting without clear log clues.

The root cause is that the JVM was designed for a full machine and does not automatically respect cgroup limits. In older JVM versions the runtime reads host memory, assumes a large heap (e.g., 16 GB on a 64 GB node), and triggers OOM on container start‑up, causing Kubernetes to exit with code 137.

Start with a Correct Dockerfile

The first optimization step is the image build process, not JVM flags. A multistage Dockerfile reduces the final image from ~900 MB to ~150 MB, removes Maven and source files, speeds up CI/CD pushes, lowers attack surface, and improves start‑up time and memory usage.

# -------- Build stage --------
FROM maven:3.9.4-eclipse-temurin-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn -q -e -DskipTests dependency:go-offline
COPY src ./src
RUN mvn clean package -DskipTests

# -------- Run stage --------
FROM eclipse-temurin:17-jre-alpine
ENV JAVA_OPTS="\
-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75 \
-XX:ActiveProcessorCount=1 \
-XX:+UseG1GC"
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar /app/app.jar"]

Copying pom.xml first enables Docker’s layer cache to reuse dependencies, cutting build time from two minutes to about 10–15 seconds in the authors’ pipeline.

Make the JVM Container‑Aware

Explicit flags are required so the JVM reads cgroup limits correctly:

-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75

These settings ensure the JVM recognises the container’s memory, avoids OOMKilled, stabilises GC behaviour, and makes memory consumption predictable.

CPU Throttling – The Hidden Killer

When a pod is limited to 0.2 CPU, the JVM may still assume many cores, spawning dozens of GC/JIT/ForkJoin threads. The Linux scheduler quickly exhausts the quota, leading to forced throttling, slower start‑up, longer GC pauses, and random latency spikes.

The correct approach is to tell the JVM the actual CPU count:

-XX:ActiveProcessorCount=1

This aligns JVM thread creation with the container’s CPU view.

Choosing the Right GC Strategy

G1GC (default recommendation) : suitable for 512 MB–4 GB memory, typical microservices, provides predictable pauses and good CPU adaptability.

SerialGC : for < 1 GB memory or ultra‑light services; single‑threaded, minimal memory, rarely hits CPU limits.

ZGC / Generational ZGC (Java 21+) : sub‑millisecond pauses and high throughput for finance, gaming, or real‑time APIs, but requires extra memory headroom.

Thread Model – Tomcat vs Virtual Threads

Tomcat’s default 200 threads (≈200 MB stack) can exhaust a small container. Enabling Project Loom virtual threads with spring.threads.virtual.enabled=true allows 100 k+ concurrent requests with negligible memory overhead, ideal for I/O‑bound services. The authors note a JDBC‑related “pinning” issue that is resolved by upgrading the driver and JDK.

Layered JARs for Faster CI/CD

Spring Boot’s layered JAR separates dependencies, boot loader, snapshot dependencies, and application code. Only the last layer changes frequently, making Docker cache highly effective and accelerating builds.

Startup Speed Techniques

CDS : reduces start‑up time by 30 %–50 %.

CRaC : achieves ~40 ms start‑up with CRIU.

Native Image / AOT : preferred for serverless scenarios.

Kubernetes Resource & Probe Configuration

Set requests to realistic values.

Set limits to 2–4 × requests or omit them to avoid excessive throttling.

Use startupProbe to give containers enough time, readinessProbe to drain traffic, and livenessProbe to restart only when truly stuck.

Production‑grade Observability

JFR – near‑zero overhead profiling.

async‑profiler – CPU flame graphs.

eBPF – kernel‑level analysis.

Temporary debug containers – safe troubleshooting.

These tools constitute the real‑world troubleshooting workflow of senior engineers.

Conclusion

The performance gap of Spring Boot in Docker is not caused by Java or containers themselves, but by the mismatch between the JVM’s default assumptions and the container’s cgroup resource model. Understanding cgroups, JVM memory/thread assumptions, and Kubernetes scheduling/limiting mechanisms turns Docker from a perceived bottleneck into a more stable and controllable environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JVM Performance Docker Kubernetes Spring Boot GC cgroups multistage-build

Written by

LuTiao Programming

LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.