Why Does Containerd’s PLEG Relisting Stall at Node Startup and How to Fix It
When replacing dockershim with containerd, we observed that pods take over a minute to start because the GenericPLEG Relisting operation stalls for more than 30 seconds during node boot, caused by containerd’s UpdateContainerResources holding a bbolt lock and intensive image pulls; the article explains the root cause and provides a fix using the overlay volatile mount option.
Technical Background
In recent internal tests of replacing dockershim with containerd, we noticed that business containers take a long time to become runnable after the pod starts. The init container finishes within a second, but the main containers sometimes need more than a minute before they start executing.
Examining kubelet logs revealed that, when a node first boots, the PLEG (Pod Lifecycle Event Generator)
Relistingmethod—normally executed once per second—takes over 30 seconds to complete. After a few minutes the issue disappears and
Relistingruns at the expected one‑second interval.
dockershim and CRI
Kubernetes 1.24 removed the dockershim component from kubelet, allowing users to choose containerd or CRI‑O as the container runtime. Containerd’s architecture evolved accordingly.
PLEG
PLEG (Pod Lifecycle Event Generator) runs on each node to keep the actual state of pods and containers in sync with the desired
spec. It reduces unnecessary work during idle periods and lowers the number of concurrent requests to the container runtime.
Pod spec state
Container runtime state
ImagePull Process
The steps performed by
ctr image pullare:
Resolve the image to be downloaded.
Pull the image from the registry, storing layers and config in the content service and metadata in the images service.
Unpack the layers into the snapshot service.
Note: the content and images services are gRPC services provided by containerd; during layer unpacking containerd temporarily mounts and unmounts all parent snapshots.
Problem Diagnosis
Based on the background, the
GenericPLEG: Relistingcall queries containerd’s CRI to obtain the list of running containers. Containerd logs show errors such as:
<code>containerd[2206]: {"error":"failed to stop container: failed to delete task: context deadline exceeded: unknown","level":"error","msg":"failed to handle container TaskExit event &TaskExit{ContainerID:...}"}</code>Goroutine dumps reveal a goroutine waiting on a
Deletecall, and another stuck in an
umountsystem call.
<code>goroutine 1654 [select]:
github.com/containerd/ttrpc.(*Client).dispatch(...)
... (stack trace omitted for brevity)</code>Further investigation of
containerd.logshows that
UpdateContainerResourcesrequests are blocked waiting for a bbolt lock:
<code>goroutine 1723 [semacquire]:
sync.runtime_SemacquireMutex(...)
... (stack trace omitted)</code>The relevant source code resides in
containerd/pkg/cri/server/container_update_resources.goand holds the container status lock while updating resources. The
ListContainersoperation also needs this lock, causing PLEG to stall.
Because the lock is held while the bbolt database syncs data to storage, I/O pressure on the host can exacerbate the delay. Monitoring tools such as PSI or
iostatcan surface the pressure.
Problem Fix
The community provided a fix in PR #8676: add a mount option
volatileto the overlay filesystem. This option skips the sync call during
umount, preventing the long pause.
Note: the volatile mount option allows overlayfs to avoid forced disk sync on unmount, reducing latency.
Applying the overlay
volatileoption mitigates the startup delay even when many images are pulled.
Disclaimer: the author’s time and perspective are limited; readers are encouraged to provide feedback and corrections.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.