Why Docker exec Fails: Diagnosing runc Errors and Resource Limits
This guide walks through a real‑world Docker exec failure, explains the relationship between kubelet, docker‑shim, containerd, and runc, shows step‑by‑step commands to isolate the faulty component, and reveals that a resource‑limit (pids) exhaustion in the container caused the runc exec error.
Learning Focus
Relationship among kubelet, docker‑shim, dockerd, containerd, containerd‑shim, and runc
Investigation method: using docker, containerd‑ctr, and docker‑runc to connect to containers
runc execution workflow
Problem Description
During on‑call troubleshooting, the system log repeatedly showed Docker exec errors:
May 12 09:08:40 HOSTNAME dockerd[4085]: time="2021-05-12T09:08:40.642410594+08:00" level=error msg="stream copy error: reading from a closed fifo"
May 12 09:08:40 HOSTNAME dockerd[4085]: time="2021-05-12T09:08:40.642418571+08:00" level=error msg="stream copy error: reading from a closed fifo"
May 12 09:08:40 HOSTNAME dockerd[4085]: time="2021-05-12T09:08:40.663754355+08:00" level=error msg="Error running exec ...: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"read init-p: connection reset by peer\": unknown"The logs indicate a failure in the exec path that must be investigated for potential business impact.
Investigation Steps
We only know that dockerd reported an exec failure. Before diving deeper, we review Docker's call chain.
The chain is long and involves many components, so we split the investigation into two parts:
Identify the component that triggers the failure
Determine why that component failed
Identifying the Faulty Component
Even experienced Docker users can quickly spot the culprit, but we follow a systematic approach:
# 1. Locate the problematic container
$ sudo docker ps | grep -v pause | grep -v NAMES | awk '{print $1}' | xargs -ti sudo docker exec {} sleep 1
sudo docker exec aa1e331ec24f sleep 1
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "read init-p: connection reset by peer": unknown
# 2. Exclude Docker itself
$ docker-containerd-ctr -a /var/run/docker/containerd/docker-containerd.sock -n moby t exec --exec-id stupig1 aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e sleep 1
ctr: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "read init-p: connection reset by peer": unknown
# 3. Exclude containerd and containerd‑shim
$ docker-runc --root /var/run/docker/runtime-runc/moby/ exec aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e sleep
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
... (stack trace omitted for brevity) ...
exec failed: container_linux.go:348: starting container process caused "read init-p: connection reset by peer"The error originates from runc.
Finding the Root Cause
runc provides a detailed error message: Resource temporarily unavailable, which typically points to a resource‑limit issue. Common limits (shown by ulimit -a) include thread count, file descriptor count, and memory.
Thread count limit reached
File descriptor limit reached
Memory limit reached
We therefore inspect the container's resource usage.
The monitoring chart shows that all containers have reached 10,000 threads, which is the default limit for Elastic Cloud containers. This limit is intentionally set to prevent a single container from exhausting host thread resources.
$ cat /sys/fs/cgroup/pids/kubepods/burstable/pod64a6c0e7-830c-11eb-86d6-b8cef604db88/aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e/pids.max
10000Thus, the exec failure is caused by the container hitting its pids (thread) limit.
runc Workflow Overview
Even after pinpointing the error, understanding runc's internal workflow helps prevent similar issues.
Using runc exec as an example, the process is:
runc exec starts a child process runc init runc init prepares the container namespace
runc exec adds the grandchild to the container's cgroup
runc exec sends the exec configuration (command, args, env) to the grandchild
The grandchild calls system.Execv to run the user command
Notes:
Steps 2.c and 3 run concurrently
Communication between runc exec and runc init uses a socket pair (init‑p / init‑c)
Relevant runc Source Code
func (p *setnsProcess) start() (err error) {
defer p.parentPipe.Close()
err = p.cmd.Start()
p.childPipe.Close()
if err != nil {
return newSystemErrorWithCause(err, "starting setns process")
}
if p.bootstrapData != nil {
if _, err := io.Copy(p.parentPipe, p.bootstrapData); err != nil {
return newSystemErrorWithCause(err, "copying bootstrap data to pipe")
}
}
if err = p.execSetns(); err != nil {
return newSystemErrorWithCause(err, "executing setns process")
}
if len(p.cgroupPaths) > 0 {
if err := cgroups.EnterPid(p.cgroupPaths, p.pid()); err != nil {
return newSystemErrorWithCausef(err, "adding pid %d to cgroups", p.pid())
}
}
if err := utils.WriteJSON(p.parentPipe, p.config); err != nil {
return newSystemErrorWithCause(err, "writing config to pipe")
}
ierr := parseSync(p.parentPipe, func(sync *syncT) error {
switch sync.Type {
case procReady:
panic("unexpected procReady in setns")
case procHooks:
panic("unexpected procHooks in setns")
default:
return newSystemError(fmt.Errorf("invalid JSON payload from child"))
}
})
if ierr != nil {
p.wait()
return ierr
}
return nil
}With the workflow and code analysis, we have fully identified the cause of the exec failure.
References
https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt
https://github.com/opencontainers/runc
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
