Cloud Native 11 min read

How to Eliminate JVM Warm‑up Delays in Kubernetes with Burstable QoS

Running Java services on Kubernetes often suffers from long JVM warm‑up times that cause high latency during deployments, but by analyzing CPU throttling and leveraging Burstable QoS with appropriate request/limit settings, you can dramatically reduce warm‑up delays without extra pods or cost.

Programmer DD
Programmer DD
Programmer DD
How to Eliminate JVM Warm‑up Delays in Kubernetes with Burstable QoS

JVM warm‑up is a painful problem for Java applications running in Kubernetes; before reaching peak performance the JVM needs time to compile and optimize code, leading to high response times during the warm‑up phase.

In containerized, high‑throughput, frequently‑deployed and auto‑scaled environments this issue is amplified.

Root Cause

During deployment we observed that pods handling over 10k RPM experienced several minutes of latency spikes, and complaints increased.

Step 1: Increase Pod Count (Costly Fix)

We first tried scaling the number of pods three‑fold and limiting each pod to about 4k RPM, adjusting the rolling update strategy with maxSurge and maxUnavailable. This eliminated the latency spikes but required three times the normal capacity.

Step 2: Warm‑up Script (Ineffective)

A Python script sent parallel requests to the service during the readiness probe, extending initialDelaySeconds. The script reduced latency slightly but increased pod readiness time from ~45 s to ~3 min and did not solve the problem.

Step 3: Heuristic Tuning

We experimented with GC algorithms (G1, CMS, Parallel), heap size and CPU allocation. Raising the CPU request/limit to 2000 m improved latency, and increasing it to 3000 m eliminated the spikes entirely.

CPU request/limit configuration
CPU request/limit configuration
Response time with 2 CPU vs 1 CPU
Response time with 2 CPU vs 1 CPU

CPU Throttling Insight

Prometheus metrics showed that container_cpu_cfs_throttled_seconds_total was high during the first 5‑7 minutes, confirming that the JVM needed more CPU than the configured limit (e.g., 1000 m) during warm‑up.

CPU throttling during warm‑up (1000m)
CPU throttling during warm‑up (1000m)
CPU throttling with 3000m
CPU throttling with 3000m

Step 4: Use Burstable QoS

Instead of a Guaranteed QoS with equal request and limit, we set a lower request (e.g., 1000 m) and a higher limit (3000 m). Kubernetes schedules the pod based on the request, but the JVM can burst up to the limit during warm‑up, using idle cluster capacity without extra cost.

Burstable QoS request vs limit
Burstable QoS request vs limit

Verification

Deployments with the Burstable configuration showed negligible CPU throttling and stable response times. Monitoring container_cpu_cfs_throttled_seconds_total confirmed the improvement.

Final deployment throttling chart
Final deployment throttling chart

Conclusion

Properly sizing request and limit values and leveraging Burstable QoS provides a cost‑effective solution to JVM warm‑up latency in Kubernetes, eliminating the need for excessive pod scaling.

Key takeaways: set appropriate request/limit values, monitor CPU throttling, and use Burstable QoS to allow temporary CPU bursts during JVM warm‑up.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JVMKubernetesJava performanceresource-limitsCPU throttlingBurstable QoS
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.