How Go 1.24’s New Spinning Mutex Boosts Performance by Up to 70%
The article explains the background of the Go mutex performance proposal, details the new spinning flag added to the mutex state, walks through fast‑path, spinning, and sleep phases of lock acquisition, presents benchmark results showing up to 70% speed‑up, and provides references for further reading.
Background
Rhys Hiltner proposed a mutex performance improvement in 2024 [1]; the optimization has been merged into the upcoming Go 1.24 release and can increase performance by up to 70% in highly contended scenarios.
In the ChanContended benchmark the author observed that increasing GOMAXPROCS caused mutex performance to degrade sharply. On an Intel i7‑13700H (linux/amd64):
With 4 threads the process throughput is half of the single‑threaded case.
With 8 threads the throughput halves again.
With 12 threads the throughput halves once more.
At GOMAXPROCS=20, 200 channel operations take an average of 44 µs, invoking unlock2 about every 220 ns and waking a sleeping thread each time. Over a 1.78 s wall‑clock interval, the 20 threads spend a total of 27.74 s in CPU (spinning) during lock2 calls.
New Proposal: Add Spinning State
Analysis shows that the current lock2 implementation allows threads to sleep in theory, but in practice all threads spin, causing slower lock hand‑off and high CPU consumption. The author therefore submitted the design "Proposal: Improve scalability of runtime.lock2" [2].
Core Optimizations
The mutex state word now includes a new flag called spinning .
const (
mutexLocked = 0x001
mutexSleeping = 0x002
mutexSpinning = 0x100
...
)The spinning bit indicates that a thread is "awake and actively trying to acquire the lock". Threads compete for the spinning state but do not block while attempting to set the flag.
Details on mutex can be found in earlier articles: https://pub.huizhou92.com/go-source-code-sync-mutex-3082a25ef092 [3]
Mutex Lock Acquisition Analysis
1. Fast Path Attempt to Acquire Lock
//https://github.com/golang/go/blob/adc9c455873fef97c5759e4811f0d9c8217fe27b/src/runtime/lock_spinbit.go#L160
k8 := key8(&l.key)
v8 := atomic.Xchg8(k8, mutexLocked)
if v8&mutexLocked == 0 {
if v8&mutexSleeping != 0 {
atomic.Or8(k8, mutexSleeping)
}
return
}The fast path behaves similarly to previous versions: if the lock is free it returns immediately, representing the ideal case with no contention.
2. Spinning Wait Phase
//https://github.com/golang/go/blob/adc9c455873fef97c5759e4811f0d9c8217fe27b/src/runtime/lock_spinbit.go#L208
if !weSpin && v&mutexSpinning == 0 && atomic.Casuintptr(&l.key, v, v|mutexSpinning) {
v |= mutexSpinning
weSpin = true
}
if weSpin || atTail || mutexPreferLowLatency(l) {
if i < spin {
procyield(mutexActiveSpinSize) // active spin
} else if i < spin+mutexPassiveSpinCount {
osyield() // passive spin
}
}If the fast path fails, execution enters the spinning phase.
The mutexSpinning flag ensures that only one goroutine spins at a time.
Active spin ( procyield) keeps the CPU busy for very short waits, while passive spin ( osyield) yields the CPU for longer waits, balancing latency and CPU usage.
Light contention uses active spin for low latency; heavy contention quickly switches to passive spin to avoid wasting CPU cycles.
Sleep Wait Phase
//https://github.com/golang/go/blob/adc9c455873fef97c5759e4811f0d9c8217fe27b/src/runtime/lock_spinbit.go#L231
// Store the current head of the list of sleeping Ms in our gp.m.mWaitList.next field
gp.m.mWaitList.next = mutexWaitListHead(v)
// Pack a (partial) pointer to this M with the current lock state bits
next := (uintptr(unsafe.Pointer(gp.m)) &^ mutexMMask) | v&mutexMMask | mutexSleeping
if weSpin {
next = next &^ mutexSpinning
}
if atomic.Casuintptr(&l.key, v, next) {
weSpin = false
semasleep(-1)
atTail = gp.m.mWaitList.next == 0
i = 0
}If spinning fails, the goroutine sleeps, is added to the wait list, and is woken up by a semaphore when the lock is released.
The runtime uses a spinbit design: when a thread is in the "awake‑and‑spinning" state, other threads are not woken, reducing contention and unnecessary context switches.
Results
goos: linux
goarch: amd64
pkg: runtime
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
│ old │ new │
│ sec/op │ sec/op vs base │
ChanContended 3.147µ ± 0% 3.703µ ± 0% +17.65% (p=0.000 n=10)
... (omitted intermediate rows) ...
geomean 17.60µ 12.46µ -29.22%Although performance may drop slightly under low contention, the changes deliver significant gains under heavy contention, averaging about a 29% improvement.
The mutex modification does not affect the API; it becomes active automatically with Go 1.24. The feature can be toggled with GOEXPERIMENT=spinbitmutex, which is enabled by default.
References
[1] Improvement proposal: https://github.com/golang/go/issues/68578
[2] Proposal: Improve scalability of runtime.lock2 – https://github.com/golang/proposal/blob/master/design/68578-mutex-spinbit.md
[3] https://pub.huizhou92.com/go-source-code-sync-mutex-3082a25ef092
[4] mutexSpinning – https://github.com/golang/go/blob/608acff8479640b00c85371d91280b64f5ec9594/src/runtime/lock_spinbit.go#L60
[5] semasleep – https://github.com/golang/go/blob/fd050b3c6d0294b6d72adb014ec14b3e6bf4ad60/src/runtime/lock_sema_tristate.go#L106
[6] https://github.com/golang/go/issues/68578#issuecomment-2256792628
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
