Why Did My JVM Show 900% CPU? Uncovering Container Limit Misconfigurations
An 8‑year ops veteran investigates a night‑time alert showing 900% CPU usage, discovers that a JVM inside a Kubernetes pod misreads host cores while the container is limited to two CPUs, and outlines how improper thread‑pool settings and monitoring metrics caused massive throttling before presenting concrete fixes.
CPU Spikes to 900%: The Truth Behind a JVM Container Misconfiguration
Prelude – a midnight alarm
At 2:30 AM a Grafana alert flooded the inbox with the message "Production environment CPU usage 900%!" . The author, an ops veteran with eight years of experience, had never seen CPU usage jump to 900% before.
First scene – puzzling monitoring data
Abnormal phenomenon
Running htop on the host showed:
# htop display
Tasks: 245 total, 24 running, 221 sleeping
%Cpu(s): 94.2 us, 4.8 sy, 0.0 ni, 1.0 id
Load average: 8.47, 8.23, 7.89But Prometheus reported:
node_cpu_usage_ratio: 9.2 (920%)
container_cpu_usage_ratio: 8.7 (870%)First doubt: the system overall CPU usage is only 94%, why does the monitor show 900%?
Container resource configuration check
# Kubernetes Deployment configuration
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "2"
memory: "8Gi"
# Problem root cause! # Inside the container
$ nproc
8 # host has 8 CPU cores
$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
200000 # 200 ms
$ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000 # 100 msKey finding: the container is limited to 2 CPUs, but the JVM detects all 8 host cores.
Technical deep dive – JVM’s “cognitive bias”
How the JVM perceives CPU resources
Before Java 8u131 the JVM obtains CPU information by directly reading /proc/cpuinfo, completely ignoring cgroup limits:
// Simplified JVM internal logic
int availableProcessors = Runtime.getRuntime().availableProcessors();
// This call reads /proc/cpuinfo and ignores cgroup restrictionsCore issue: the JVM sees 8 CPUs while the container only receives time slices for 2 CPUs.
Thread‑pool configuration chain reaction
The application used a classic thread‑pool configuration:
// Problematic code
int corePoolSize = Runtime.getRuntime().availableProcessors() * 2;
int maximumPoolSize = Runtime.getRuntime().availableProcessors() * 4;
ThreadPoolExecutor executor = new ThreadPoolExecutor(
corePoolSize, // 16 core threads
maximumPoolSize, // 32 max threads
60L, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(1000)
);Result: 32 threads fiercely compete for only 2 CPU time slices, causing massive context switches.
Deceptive monitoring metrics
CPU usage calculation formula
Container CPU usage = (actual CPU time used / CPU limit time) × 100%.
Container CPU usage = (actual CPU time / CPU limit time) × 100%When the container limit is 2 cores but the demand far exceeds 2 cores:
Actual usage: 2000 ms (reached the limit)
Expected usage: 9000 ms (32 threads demand)
Displayed usage: (9000/2000) × 100% = 450%
This explains why the monitor shows over 100% CPU usage.
More accurate metrics
# More accurate container CPU pressure metric
rate(container_cpu_usage_seconds_total[5m]) /
(container_spec_cpu_quota / container_spec_cpu_period) * 100
# CPU throttling metric
rate(container_cpu_cfs_throttled_seconds_total[5m])Solution – three‑pronged approach
1. JVM parameter optimization (immediate effect)
# Tell JVM the real CPU core count
-XX:ActiveProcessorCount=2
# Enable container awareness (Java 8u191+)
-XX:+UseContainerSupport
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap2. Application‑level refactor
// Fixed thread‑pool implementation
public class ContainerAwareThreadPool {
private static final int CPU_CORES = getCpuCores();
private static int getCpuCores() {
// Prefer JVM parameter
String activeProcessorCount = System.getProperty("java.lang.Integer.IntegerCache.high");
if (activeProcessorCount != null) {
return Integer.parseInt(activeProcessorCount);
}
// Check container limits
try {
long quota = Files.lines(Paths.get("/sys/fs/cgroup/cpu/cpu.cfs_quota_us"))
.mapToLong(Long::parseLong).findFirst().orElse(-1);
long period = Files.lines(Paths.get("/sys/fs/cgroup/cpu/cpu.cfs_period_us"))
.mapToLong(Long::parseLong).findFirst().orElse(100000);
if (quota > 0) {
return (int) Math.ceil((double) quota / period);
}
} catch (Exception e) {
// fallback
}
return Runtime.getRuntime().availableProcessors();
}
}3. Infrastructure adjustments
# Optimized Kubernetes configuration
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4" # loosen CPU limit
memory: "8Gi"
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:ActiveProcessorCount=2"Effect verification – dramatic data contrast
Before optimization
CPU usage: 850%-950%
P95 response time: 8.5s
Threads: 32 workers
Context switches: 45000/s
CPU throttling time: 85%After optimization
CPU usage: 65%-80%
P95 response time: 180ms
Threads: 8 workers
Context switches: 3200/s
CPU throttling time: 2%Performance improved by more than 40×.
Deep reflection – traps of containerization
New challenges introduced by containers
Resource awareness issue: applications cannot correctly perceive container limits.
Monitoring complexity: traditional metrics may mislead in container environments.
Optimization experience invalidation: physical‑machine tuning knowledge must be revisited.
Best‑practice checklist
Confirm JVM version supports container awareness.
Set the correct ActiveProcessorCount JVM flag.
Validate thread‑pool configuration against real CPU core count.
Establish container‑level monitoring metrics.
Test application behavior under CPU‑throttling scenarios.
Monitoring alert optimization
# Prometheus alert rule
alert: ContainerCpuThrottling
expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "Container CPU throttling"
description: "{{ $labels.container }} CPU throttling rate exceeds 10%"Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
