Why Ubuntu 22.04 Upgrade Crashes Java Apps on Kubernetes: The cgroup v2 Trap
Upgrading a Kubernetes cluster from CentOS 7.9 to Ubuntu 22.04 caused Java pods to crash with OOMKilled errors; increasing memory limits only hid the issue, and the root cause was cgroup v2 making the JVM misinterpret its resource limits, resulting in excessive threads and heap sizes. The article advises upgrading to a JVM that supports cgroup v2 or reverting the node to cgroup v1.
Incident Overview
A Kubernetes cluster was upgraded from CentOS 7.9 to Ubuntu 22.04. After the rollout several Java‑based pods were terminated with OOMKilled errors. Raising the pod memory limit from 2Gi to 8Gi allowed the containers to start, but CPU, memory and thread usage spiked and unexplained disk‑write activity appeared.
Last State: Terminated
Reason: OOMKilled
Exit Code: 137Temporary Remedy – Increase Memory
Increase the pod memory limit from 2Gi to 8Gi.
The pods start, but resource consumption remains abnormal and the mysterious disk writes persist.
Root Cause – JVM Misreading cgroup v2
Using jcmd <PID> VM.flags on a healthy pod and on a failing pod revealed that the JVM in the failing pod ignored the container limits ( 2c2g) and behaved as if it had many more CPUs and far more memory.
# Healthy pod (2c2g)
-XX:CICompilerCount=2 # compiler threads ≈ CPU cores
-XX:InitialHeapSize=33554432 # ~32 MiB
-XX:MaxHeapSize=536870912 # ~512 MiB (well below 2 Gi limit)
...
# Faulty pod (2c2g)
-XX:CICompilerCount=15 # 15 compiler threads! > 2 cores
-XX:InitialHeapSize=2147483648 # 2 Gi initial heap
-XX:MaxHeapSize=32210157568 # ~30 Gi max heap (exceeds limit)
...cgroup v2 Compatibility Issue
Ubuntu 22.04 enables cgroup v2 by default, while the previous CentOS 7.9 environment used cgroup v1 . Many older Java runtimes and libraries cannot correctly detect cgroup v2 resource limits and therefore assume a much larger host, leading to oversized heap and thread settings that trigger OOMKilled.
Solution 1 – Upgrade JVM / Runtime
Use a Java runtime that fully supports cgroup v2. Minimum versions include:
OpenJDK / HotSpot – JDK 8u372, 11.0.16, 15 or newer
IBM Semeru Runtimes – 8.0.382.0, 11.0.20.0, 17.0.8.0 or newer
IBM Java – 8.0.8.6 or newer
uber-go/automaxprocs (for Go applications) – v1.5.1 or newer
Solution 2 – Revert to cgroup v1 (Temporary Compatibility)
Confirm the current cgroup version: stat -fc %T /sys/fs/cgroup/ Output cgroup2fs indicates v2; tmpfs indicates v1.
Edit /etc/default/grub and append systemd.unified_cgroup_hierarchy=0 to the GRUB_CMDLINE_LINUX line.
Apply the changes and reboot:
sudo update-grub
sudo rebootAfter reboot, verify the node is using cgroup v1 with the same stat command; it should now output tmpfs (v1).
Conclusion
The cascade of OOMKilled events after the CentOS‑to‑Ubuntu upgrade was caused by a cgroup v2 compatibility problem: the JVM perceived far more CPU and memory than the pod actually had, allocating an oversized heap and excessive compiler threads. Upgrading to a JVM/runtime that understands cgroup v2 or switching the node back to cgroup v1 resolves the hidden resource‑misinterpretation and restores stable operation.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
