Why Does a Kubernetes Pod Stay in ContainerCreating? Diagnosing Stuck RBD Volumes
This guide explains why a Kubernetes pod can remain in ContainerCreating due to an RBD volume still being used, walks through identifying the holding node with Ceph commands, and provides forced unmount and mountinfo techniques to resolve the issue.
Today I discovered a pod stuck in the ContainerCreating state; the kubectl describe output showed a FailedMount warning indicating that the RBD image
kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87was still in use.
The pod needed to mount a PVC, but the PVC was being used elsewhere. No other Deployment was using it, so the issue lay in the volume lifecycle.
The volume creation process is: (1) create the volume, (2) mount it on the node, (3) map it into the pod. Deleting a pod reverses these steps, and a failure during unmount can leave the image busy.
From the warning we extracted useful information:
rbd image kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87 is still being usedUsing Ceph commands we identified the pool and image name, then ran:
# rbd info kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87Key field block_name_prefix gave us rbd_data.fb236b8b4567. Replacing data with header and running:
# rados listwatchers -p kube rbd_header.fb236b8b4567revealed the node 192.168.100.181 that was holding the image.
On that node we checked the device link:
ls -l /dev/rbd/kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87which pointed to /dev/rbd4. Attempting to unmap with: # rbd unmap /dev/rbd4 failed with "sysfs write failed" and "Device or resource busy". A typical next step is to use lsof to find the owning process, but none was found.
Two practical solutions were applied:
Force unmount with rbd unmap -o force /dev/rbd4.
Search mountinfo for the device to locate the PID: grep 'rbd4' /proc/*/task/*/mountinfo.
After successfully unmounting the RBD image, the pod started normally.
Final Thoughts
Because the workload was managed by a Deployment rather than a StatefulSet, this issue arose. To avoid it, consider using a PVC with ReadWriteMany access mode or set maxSurge to 0 in the Deployment to prevent extra pods during updates.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
