Fundamentals 9 min read

Unlocking Go’s sync/atomic: How Atomic Operations Achieve Lock‑Free Concurrency

This article dives deep into Go’s sync/atomic package, explaining its low‑level CPU implementation, performance advantages over mutexes, core operation families, practical examples like CAS‑based spin locks and atomic.Value hot‑reloading, and provides guidance on when to choose atomic primitives.

Code Wrench

Nov 2, 2025

Unlocking Go’s sync/atomic: How Atomic Operations Achieve Lock‑Free Concurrency

Why atomic stands out

In high‑concurrency Go programs the three main synchronization primitives are sync.Mutex, channels, and sync/atomic. sync/atomic provides lock‑free, nanosecond‑scale operations that map directly to CPU atomic instructions such as LOCK XADDQ. It is the layer closest to hardware and guarantees indivisible updates of single variables.

Core atomic operation family

Add : atomic.AddInt64(&x, n) – atomic increment.

Load : atomic.LoadInt64(&x) – read with visibility guarantees.

Store : atomic.StoreInt64(&x, v) – write with ordering guarantees.

Swap : atomic.SwapInt64(&x, v) – atomic replace.

CompareAndSwap (CAS) : atomic.CompareAndSwapInt64(&x, old, new) – lock‑free conditional update.

Benchmark: Mutex vs atomic

var total int64
var mu sync.Mutex

func addByMutex(n int) {
    for i := 0; i < n; i++ {
        mu.Lock()
        total++
        mu.Unlock()
    }
}

func addByAtomic(n int) {
    for i := 0; i < n; i++ {
        atomic.AddInt64(&total, 1)
    }
}

func main() {
    const loops = 1_000_000
    t1 := time.Now()
    addByMutex(loops)
    fmt.Println("Mutex elapsed:", time.Since(t1))

    total = 0
    t2 := time.Now()
    addByAtomic(loops)
    fmt.Println("Atomic elapsed:", time.Since(t2))
}

On typical hardware the atomic version runs 3–5× faster and never blocks.

CAS‑based spin lock

type SpinLock struct {
    locked int32
}

func (s *SpinLock) Lock() {
    for !atomic.CompareAndSwapInt32(&s.locked, 0, 1) {
        runtime.Gosched() // yield the processor
    }
}

func (s *SpinLock) Unlock() {
    atomic.StoreInt32(&s.locked, 0)
}

var lock SpinLock
var count int64

func worker() {
    lock.Lock()
    count++
    lock.Unlock()
}

Spin locks avoid kernel blocking but consume CPU cycles; they are appropriate only for extremely short critical sections on multi‑core CPUs.

Assembly implementation of atomic.AddInt64

TEXT ·Xchg64(SB), NOSPLIT, $0-24
    // ptr = address of the variable, new = value to add
    MOVQ    ptr+0(FP), BX   // load address
    MOVQ    new+8(FP), AX   // load increment
    // atomic exchange with lock prefix
    LOCK XADDQ AX, 0(BX)   // adds AX to *BX, returns old value in AX
    MOVQ    AX, ret+16(FP) // store old value as return
    RET

The LOCK XADDQ instruction performs the addition atomically by locking the memory bus, guaranteeing that no other core can access the operand until the operation completes.

Memory barriers

MFENCE   // write barrier
LFENCE   // read barrier

These barriers are emitted by atomic.Store and atomic.Load to ensure that writes become visible to other CPUs before subsequent reads, preserving program order on weakly ordered architectures.

Using atomic.Value for hot‑reloading configuration

type Config struct {
    Addr string
    Port int
}

var config atomic.Value

func init() {
    config.Store(&Config{Addr: "127.0.0.1", Port: 8080})
}

func GetConfig() *Config { return config.Load().(*Config) }

func reload() {
    newCfg := &Config{Addr: "0.0.0.0", Port: 9090}
    config.Store(newCfg)
}

This pattern is lock‑free, type‑safe, and ideal for scenarios with many readers and few writers, such as feature‑flag toggles or dynamic service configuration.

Guidelines for choosing atomic vs mutex

Single‑variable counters or flags – use the atomic primitives.

Read‑heavy configuration that changes rarely – use atomic.Value.

Updates that involve multiple fields or complex logic – prefer sync.Mutex.

Critical sections that may run for a noticeable time – prefer sync.Mutex to avoid CPU waste.

Extremely short lock holds (nanosecond scale) – consider CAS or a spin lock.

Atomic’s layered architecture (text diagram)

┌──────────────────────────────┐
│ sync/atomic                    │ API layer
│ ├── AddInt64 / CAS / Load      │
│ └── Value                     │ Advanced interface
└──────────────────────────────┘
        ↓
┌──────────────────────────────┐
│ runtime/internal/atomic        │ Assembly implementation
│ └── LOCK XADDQ / CMPXCHGQ      │ CPU atomic instructions
└──────────────────────────────┘
        ↓
┌──────────────────────────────┐
│ CPU cache‑coherency (MESI)    │ Bus lock, visibility guarantees
└──────────────────────────────┘

Key takeaways

sync/atomic

is the low‑level foundation of Go concurrency, providing true atomicity via hardware instructions.

CAS is the core primitive for all lock‑free algorithms.

Memory barriers inserted by the runtime ensure that Store and Load observe a consistent order across cores.

Understanding sync/atomic clarifies the baseline guarantees of Go’s memory model.