Why Go 1.8’s ForkLock Can Hang Goroutines and How Go 1.9 Solves It
The article investigates a Go 1.8.3 issue where goroutines block on ForkLock during fork‑exec, analyzes the kernel and Go runtime behavior, presents a hypothesis about memory‑heavy processes, validates it with experiments, and shows that upgrading to Go 1.9 or later eliminates the problem.
1. Incident Origin
A colleague posted in an internal mailing list that a Go 1.8.3 service occasionally had goroutines stuck waiting on a ForkLock. Believing it to be a bug, they opened a GitHub issue (https://github.com/golang/go/issues/26836) and attempted reproductions without success.
2. Problem Analysis
ForkLock exists to prevent race conditions when multiple goroutines concurrently perform fork‑exec. It ensures that only file descriptors marked O_CLOEXEC are inherited by the child, avoiding unwanted descriptor leakage. Linux kernels ≥2.6.27 make opening files with O_CLOEXEC atomic, but Go requires kernel ≥2.6.23, and on older Unix systems open and O_CLOEXEC are separate operations, so a lock is needed around fork.
The observed symptom suggests a goroutine is stuck in either forkExecPipe or forkAndExecInChild, holding the lock and starving other goroutines.
3. Hypothesis
Since pipe2 is fast, the blocking likely occurs in clone or exec. Comparing Go 1.8, 1.9 and 1.10 source shows that Go 1.9 added CLONE_VFORK and CLONE_VM, allowing the child to share memory with the parent (similar to vfork) and avoid copying page tables, which dramatically reduces fork latency for large processes.
The hypothesis: in programs compiled with Go <1.9, when memory usage is large and process creation is frequent, the ForkLock can be held for a long time, causing goroutine starvation.
4. Experimental Verification
A test program written in Go 1.8.3 was run on a 2‑CPU, 4 GB VM (kernel 3.10.0‑693.17.1.el7.x86_64). Every 10 seconds the program received a SIGUSR1 signal to dump stack traces. Over time, some goroutines spent increasingly long periods waiting for ForkLock, as shown in the following screenshots.
Running the same test on Go 1.9 and later did not reproduce the issue, confirming that the upgrade resolves the problem.
5. Conclusion
vfork was introduced to avoid the performance penalty of copying page tables during fork. Most fork‑exec scenarios benefit from vfork because the child immediately calls exec, which discards the inherited page tables anyway. However, vfork shares the parent’s memory, so the child must not modify shared variables, and the kernel suspends the parent until the child exits, limiting vfork to exec‑only use cases.
Before Go 1.9’s vfork implementation, a commit (https://go-review.googlesource.com/c/go/+/46173) added a safeguard: after the raw vfork syscall returns, the parent does nothing before returning, preventing the child from corrupting shared state.
For deeper details, see the review discussion involving Rob Pike and others.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
