Why Does My Container Show 900% CPU? Uncovering JVM and Cgroup Mismatches
An experienced ops engineer investigates a night‑time Grafana alert showing 900% CPU usage, discovers a mismatch between JVM‑detected cores and container limits, explains the root cause, and presents a three‑step solution with code snippets, monitoring tweaks, and performance results.
Scene 1: Anomalous Monitoring Data
Abnormal Phenomena Overview
Login to the monitoring system reveals contradictory numbers: the host shows ~94% CPU usage while Prometheus reports 900% usage.
# htop display
Tasks: 245 total, 24 running, 221 sleeping
%Cpu(s): 94.2 us, 4.8 sy, 0.0 ni, 1.0 id
Load average: 8.47, 8.23, 7.89
# Prometheus monitoring shows
node_cpu_usage_ratio: 9.2 (920%)
container_cpu_usage_ratio: 8.7 (870%)First suspicion: The system overall CPU usage is only 94%, why does the monitor show 900%?
Container Resource Configuration Check
# Kubernetes Deployment configuration
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "2"
memory: "8Gi"Container Internal Check
# Inside the container
$ nproc
8 # Host has 8 CPU cores
$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
200000 # 200ms
$ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000 # 100msKey finding: The container is limited to 2 CPUs, but the JVM detects all 8 host cores.
Technical Deep Dive: JVM’s "Perception Bias"
How JVM Perceives CPU Resources
Before Java 8u131, the JVM obtains CPU information in a raw way:
// Simplified JVM internal logic
int availableProcessors = Runtime.getRuntime().availableProcessors();
// This method reads /proc/cpuinfo directly, ignoring cgroup limitsCore issue: JVM sees 8 CPUs while the container only has 2 CPU time slices.
Thread‑Pool Configuration Chain Reaction
The application uses a classic thread‑pool configuration based on the detected CPU count:
// Problematic code
int corePoolSize = Runtime.getRuntime().availableProcessors() * 2;
int maximumPoolSize = Runtime.getRuntime().availableProcessors() * 4;
ThreadPoolExecutor executor = new ThreadPoolExecutor(
corePoolSize, // 16 core threads
maximumPoolSize, // 32 max threads
60L, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(1000)
);Result: 32 threads fiercely compete for only 2 CPU slices, causing massive context switches.
Monitoring Metric "Deception"
CPU Usage Calculation Formula
The key to understanding the problem is the CPU usage calculation:
Container CPU Usage = (Actual CPU time used / CPU limit time) × 100%When the container limit is 2 cores but the demand far exceeds 2 cores:
Actual usage: 2000 ms (already at the limit)
Expected usage: 9000 ms (32 threads demand)
Displayed usage: (9000/2000) × 100% = 450%
This explains why the monitor shows over 100% CPU usage.
More Precise Monitoring Metrics
# More accurate container CPU pressure metric
(
rate(container_cpu_usage_seconds_total[5m]) /
(container_spec_cpu_quota / container_spec_cpu_period)
) * 100
# CPU throttling metric
rate(container_cpu_cfs_throttled_seconds_total[5m])Solution: Three‑Pronged Approach
1. JVM Parameter Optimization (Immediate Effect)
# Tell JVM the real CPU core count
-XX:ActiveProcessorCount=2
# Enable container awareness (Java 8u191+)
-XX:+UseContainerSupport
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap2. Application‑Level Refactor
public class ContainerAwareThreadPool {
private static final int CPU_CORES = getCpuCores();
private static int getCpuCores() {
// Prefer JVM parameter
String activeProcessorCount = System.getProperty("java.lang.Integer.IntegerCache.high");
if (activeProcessorCount != null) {
return Integer.parseInt(activeProcessorCount);
}
// Check container limits
try {
long quota = Files.lines(Paths.get("/sys/fs/cgroup/cpu/cpu.cfs_quota_us"))
.mapToLong(Long::parseLong).findFirst().orElse(-1);
long period = Files.lines(Paths.get("/sys/fs/cgroup/cpu/cpu.cfs_period_us"))
.mapToLong(Long::parseLong).findFirst().orElse(100000);
if (quota > 0) {
return (int) Math.ceil((double) quota / period);
}
} catch (Exception e) {
// fallback
}
return Runtime.getRuntime().availableProcessors();
}
}3. Infrastructure Adjustments
# Kubernetes configuration optimization
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4" # loosen CPU limit
memory: "8Gi"
# JVM startup parameters
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:ActiveProcessorCount=2"Effect Verification: Dramatic Data Comparison
Before Optimization
CPU Usage: 850%-950%
P95 Response Time: 8.5s
Threads: 32 workers
Context Switches: 45000/s
CPU Throttling: 85%After Optimization
CPU Usage: 65%-80%
P95 Response Time: 180ms
Threads: 8 workers
Context Switches: 3200/s
CPU Throttling: 2%Performance improved by more than 40×.
Deep Thinking: Containerization Pitfalls and Wisdom
Resource Awareness Issues: Applications cannot correctly perceive container limits.
Monitoring Complexity: Traditional metrics may mislead in container environments.
Optimization Experience Invalidated: Physical‑machine experience needs re‑evaluation.
Best‑Practice Checklist
Confirm JVM version supports container awareness.
Set the correct ActiveProcessorCount JVM flag.
Validate thread‑pool configuration against real CPU cores.
Establish container‑level monitoring metrics.
Test application behavior under CPU throttling scenarios.
Monitoring Alert Optimization
# Prometheus alert rule
alert: ContainerCpuThrottling
expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "Container CPU throttling"
description: "{{ $labels.container }} CPU throttling rate exceeds 10%"Final Thoughts: Growth Reflections for Technologists
This incident taught that in the cloud‑native era, operations are no longer just “add machines, tweak parameters”. We must deeply understand underlying principles, stay technically sensitive, and adopt systematic thinking from application to container to kernel.
Have you encountered similar containerization traps? Share your experience in the comments!
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
