Why Does go build -race Crash with Auto‑Instrumentation? Inside Go’s Runtime
The article analyzes why using the auto‑instrumentation command `otel go build -race` causes a crash, tracing the failure to the injected runtime code, the Go‑C calling conventions, and a zeroed race context, and then presents practical fixes to prevent the crash.
Recently, Alibaba Cloud ARMS, the compiler team, and the MSE team jointly released an open‑source Go compile‑time auto‑instrumentation that provides Java‑level monitoring with zero intrusion. Developers replace go build with otel go build to enable full monitoring and governance.
When users replace the normal go build -race with otel go build -race, the generated binary crashes. The -race flag enables the Go race detector, which adds extra checks to detect data races.
The crash stack trace shows the failure originates from __tsan_func_enter and the key point is runtime.contextPropagate. The tool inserts the following code at the beginning of runtime.newproc1:
func newproc1(fn *funcval, callergp *g, callerpc uintptr) (retVal0 *g) {
// injected code
retVal0.otel_trace_context = contextPropagate(callergp.otel_trace_context)
...
}
func contextPropagate(tls interface{}) interface{} {
if tls == nil {
return nil
}
if taker, ok := tls.(ContextSnapshoter); ok {
return taker.TakeSnapShot()
}
return tls
}
func (tc *traceContext) TakeSnapShot() interface{} {
...
} TakeSnapShotis instrumented by the race detector, which inserts calls to racefuncenter() and racefuncexit(). This leads to a call chain:
racefuncenter (Go) → racecall (Go) → __tsan_func_enter (C)Understanding the Go and C calling conventions on amd64, the first nine function arguments are passed in registers. The relevant registers are shown below:
For the System V AMD64 convention (used when Go calls C), the first six arguments are passed in RDI, RSI, RDX, RCX, R8, R9:
The analysis reveals that g_racectx(R14) is zero. In Go’s GMP model, R14 holds the current goroutine, which cannot be zero; the zero value comes from g0.racectx, which the runtime sets to zero at program start in main:
// src/runtime/proc.go#main
func main() {
mp := getg().m
// g0's racectx is only used as the parent of the main goroutine.
mp.g0.racectx = 0
...
}Because newproc1 runs on the g0 goroutine, the injected contextPropagate receives a zero racectx, causing __tsan_func_enter to dereference a null pointer and crash.
One fix is to mark TakeSnapShot with the compiler directive //go:norace, which tells the race detector to ignore memory accesses in that function, preventing the automatic insertion of racefuncenter(). However, the function also performs map initialization and iteration, which the compiler expands into calls like mapiterinit() that are hard‑coded to enable race checks and cannot be suppressed with //go:norace. The practical solution is to avoid using map data structures in the code injected into newproc1.
The runtime package itself is marked with NoInstrument via the pkgSpecials table, so the compiler skips race instrumentation for its code:
var pkgSpecialsOnce = sync.OnceValue(func() map[string]PkgSpecial {
for _, pkg := range runtimePkgs {
set(pkg, func(ps *PkgSpecial) {
ps.Runtime = true
ps.NoInstrument = true
})
}
...
})In summary, the crash is caused by the injected contextPropagate calling TakeSnapShot under the race detector, which receives a zero racectx from the g0 goroutine. Adding //go:norace to TakeSnapShot and avoiding map usage in the injected code resolves the issue.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
