Cloud Native 6 min read

Why Does a Kubernetes Pod Stay in ContainerCreating? Diagnosing Stuck RBD Volumes

This guide explains why a Kubernetes pod can remain in ContainerCreating due to an RBD volume still being used, walks through identifying the holding node with Ceph commands, and provides forced unmount and mountinfo techniques to resolve the issue.

Ops Development Stories

Jul 28, 2021

Why Does a Kubernetes Pod Stay in ContainerCreating? Diagnosing Stuck RBD Volumes

Today I discovered a pod stuck in the ContainerCreating state; the kubectl describe output showed a FailedMount warning indicating that the RBD image

kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87

was still in use.

The pod needed to mount a PVC, but the PVC was being used elsewhere. No other Deployment was using it, so the issue lay in the volume lifecycle.

The volume creation process is: (1) create the volume, (2) mount it on the node, (3) map it into the pod. Deleting a pod reverses these steps, and a failure during unmount can leave the image busy.

From the warning we extracted useful information:

rbd image kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87 is still being used

Using Ceph commands we identified the pool and image name, then ran:

# rbd info kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87

Key field block_name_prefix gave us rbd_data.fb236b8b4567. Replacing data with header and running:

# rados listwatchers -p kube rbd_header.fb236b8b4567

revealed the node 192.168.100.181 that was holding the image.

On that node we checked the device link:

ls -l /dev/rbd/kube/kubernetes-dynamic-pvc-bbfd3466-9f2f-11ea-8e91-5a4125e02b87

which pointed to /dev/rbd4. Attempting to unmap with: # rbd unmap /dev/rbd4 failed with "sysfs write failed" and "Device or resource busy". A typical next step is to use lsof to find the owning process, but none was found.

Two practical solutions were applied:

Force unmount with rbd unmap -o force /dev/rbd4.

Search mountinfo for the device to locate the PID: grep 'rbd4' /proc/*/task/*/mountinfo.

After successfully unmounting the RBD image, the pod started normally.

Final Thoughts

Because the workload was managed by a Deployment rather than a StatefulSet, this issue arose. To avoid it, consider using a PVC with ReadWriteMany access mode or set maxSurge to 0 in the Deployment to prevent extra pods during updates.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deployment kubernetes Ceph statefulset pod RBD VolumeMount

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.