Unlocking Go’s sync/atomic: How Atomic Operations Achieve Lock‑Free Concurrency
This article dives deep into Go’s sync/atomic package, explaining its low‑level CPU implementation, performance advantages over mutexes, core operation families, practical examples like CAS‑based spin locks and atomic.Value hot‑reloading, and provides guidance on when to choose atomic primitives.
Why atomic stands out
In high‑concurrency Go programs the three main synchronization primitives are sync.Mutex, channels, and sync/atomic. sync/atomic provides lock‑free, nanosecond‑scale operations that map directly to CPU atomic instructions such as LOCK XADDQ. It is the layer closest to hardware and guarantees indivisible updates of single variables.
Core atomic operation family
Add : atomic.AddInt64(&x, n) – atomic increment.
Load : atomic.LoadInt64(&x) – read with visibility guarantees.
Store : atomic.StoreInt64(&x, v) – write with ordering guarantees.
Swap : atomic.SwapInt64(&x, v) – atomic replace.
CompareAndSwap (CAS) : atomic.CompareAndSwapInt64(&x, old, new) – lock‑free conditional update.
Benchmark: Mutex vs atomic
var total int64
var mu sync.Mutex
func addByMutex(n int) {
for i := 0; i < n; i++ {
mu.Lock()
total++
mu.Unlock()
}
}
func addByAtomic(n int) {
for i := 0; i < n; i++ {
atomic.AddInt64(&total, 1)
}
}
func main() {
const loops = 1_000_000
t1 := time.Now()
addByMutex(loops)
fmt.Println("Mutex elapsed:", time.Since(t1))
total = 0
t2 := time.Now()
addByAtomic(loops)
fmt.Println("Atomic elapsed:", time.Since(t2))
}On typical hardware the atomic version runs 3–5× faster and never blocks.
CAS‑based spin lock
type SpinLock struct {
locked int32
}
func (s *SpinLock) Lock() {
for !atomic.CompareAndSwapInt32(&s.locked, 0, 1) {
runtime.Gosched() // yield the processor
}
}
func (s *SpinLock) Unlock() {
atomic.StoreInt32(&s.locked, 0)
}
var lock SpinLock
var count int64
func worker() {
lock.Lock()
count++
lock.Unlock()
}Spin locks avoid kernel blocking but consume CPU cycles; they are appropriate only for extremely short critical sections on multi‑core CPUs.
Assembly implementation of atomic.AddInt64
TEXT ·Xchg64(SB), NOSPLIT, $0-24
// ptr = address of the variable, new = value to add
MOVQ ptr+0(FP), BX // load address
MOVQ new+8(FP), AX // load increment
// atomic exchange with lock prefix
LOCK XADDQ AX, 0(BX) // adds AX to *BX, returns old value in AX
MOVQ AX, ret+16(FP) // store old value as return
RETThe LOCK XADDQ instruction performs the addition atomically by locking the memory bus, guaranteeing that no other core can access the operand until the operation completes.
Memory barriers
MFENCE // write barrier
LFENCE // read barrierThese barriers are emitted by atomic.Store and atomic.Load to ensure that writes become visible to other CPUs before subsequent reads, preserving program order on weakly ordered architectures.
Using atomic.Value for hot‑reloading configuration
type Config struct {
Addr string
Port int
}
var config atomic.Value
func init() {
config.Store(&Config{Addr: "127.0.0.1", Port: 8080})
}
func GetConfig() *Config { return config.Load().(*Config) }
func reload() {
newCfg := &Config{Addr: "0.0.0.0", Port: 9090}
config.Store(newCfg)
}This pattern is lock‑free, type‑safe, and ideal for scenarios with many readers and few writers, such as feature‑flag toggles or dynamic service configuration.
Guidelines for choosing atomic vs mutex
Single‑variable counters or flags – use the atomic primitives.
Read‑heavy configuration that changes rarely – use atomic.Value.
Updates that involve multiple fields or complex logic – prefer sync.Mutex.
Critical sections that may run for a noticeable time – prefer sync.Mutex to avoid CPU waste.
Extremely short lock holds (nanosecond scale) – consider CAS or a spin lock.
Atomic’s layered architecture (text diagram)
┌──────────────────────────────┐
│ sync/atomic │ API layer
│ ├── AddInt64 / CAS / Load │
│ └── Value │ Advanced interface
└──────────────────────────────┘
↓
┌──────────────────────────────┐
│ runtime/internal/atomic │ Assembly implementation
│ └── LOCK XADDQ / CMPXCHGQ │ CPU atomic instructions
└──────────────────────────────┘
↓
┌──────────────────────────────┐
│ CPU cache‑coherency (MESI) │ Bus lock, visibility guarantees
└──────────────────────────────┘Key takeaways
sync/atomicis the low‑level foundation of Go concurrency, providing true atomicity via hardware instructions.
CAS is the core primitive for all lock‑free algorithms.
Memory barriers inserted by the runtime ensure that Store and Load observe a consistent order across cores.
Understanding sync/atomic clarifies the baseline guarantees of Go’s memory model.
Further reading
Official Go Memory Model: https://go.dev/ref/mem
Source file: runtime/internal/atomic/asm_amd64.s
Medium article on memory barriers: https://medium.com/@AlexanderObregon/memory-barriers-in-go-concurrency-6259919c7b6a
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Wrench
Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
