Fundamentals 8 min read

Understanding False Sharing and Cache Padding in Go

This article explains the concept of false sharing caused by CPU cache line interactions, demonstrates how cache padding can mitigate the performance penalty, and provides Go benchmark code and results to illustrate the impact on multi‑core concurrency.

Go Programming World

Jul 4, 2024

Understanding False Sharing and Cache Padding in Go

Before discussing false sharing, it is necessary to briefly introduce how CPU cache works. The smallest unit in a CPU cache is a cache line (typically 64 bytes), so when a core reads a variable from memory it also loads nearby variables.

When core1 reads variable a, it also brings variable b into its cache line, based on the principle of spatial locality. If two variables that reside on different cores share the same cache line, updating one variable forces the other core to invalidate its cache, even if the second variable was not modified.

This phenomenon is called false sharing : an update by one core forces other cores to reload the entire cache line, dramatically reducing performance because cache reads are much faster than memory reads.

A common solution is cache padding , inserting meaningless variables between frequently accessed fields so that each field occupies its own cache line, preventing other cores from invalidating it.

Below is a Go example illustrating false sharing. The first struct type NoPad struct { a uint64; b uint64; c uint64 } places three uint64 fields consecutively, while the second struct adds padding arrays _p1 [8]uint64, _p2 [8]uint64, _p3 [8]uint64 to separate the fields:

type NoPad struct {
    a uint64
    b uint64
    c uint64
}

func (myatomic *NoPad) IncreaseAllEles() {
    atomic.AddUint64(&myatomic.a, 1)
    atomic.AddUint64(&myatomic.b, 1)
    atomic.AddUint64(&myatomic.c, 1)
}

type Pad struct {
    a   uint64
    _p1 [8]uint64
    b   uint64
    _p2 [8]uint64
    c   uint64
    _p3 [8]uint64
}

func (myatomic *Pad) IncreaseAllEles() {
    atomic.AddUint64(&myatomic.a, 1)
    atomic.AddUint64(&myatomic.b, 1)
    atomic.AddUint64(&myatomic.c, 1)
}

A benchmark driver runs many goroutines that repeatedly call IncreaseAllEles on each struct:

func testAtomicIncrease(myatomic MyAtomic) {
    paraNum := 1000
    addTimes := 1000
    var wg sync.WaitGroup
    wg.Add(paraNum)
    for i := 0; i < paraNum; i++ {
        go func() {
            for j := 0; j < addTimes; j++ {
                myatomic.IncreaseAllEles()
            }
            wg.Done()
        }()
    }
    wg.Wait()
}

func BenchmarkNoPad(b *testing.B) {
    myatomic := &NoPad{}
    b.ResetTimer()
    testAtomicIncrease(myatomic)
}

func BenchmarkPad(b *testing.B) {
    myatomic := &Pad{}
    b.ResetTimer()
    testAtomicIncrease(myatomic)
}

On a 2014 MBA the original benchmark reported a speedup from 0.07 ns/op to 0.02 ns/op when using padding. However, on a 2022 M2‑chip MBA the author observed opposite results, with the padded version being slower.

Further experiments compare a version without padding to one with explicit cache padding ( _ [8]uint64) in a parallel benchmark. The padded version shows a dramatic improvement (22.09 ns/op down to 1.075 ns/op) because each goroutine updates a value that resides on a separate cache line, eliminating false sharing.

Before applying cache padding in production, two key points must be considered: (1) know the cache‑line size of the target CPU to choose an appropriate padding size, and (2) padding increases memory consumption, so benchmark to ensure the performance gain justifies the extra memory.

All example code is available on GitHub, and readers are encouraged to run their own benchmarks to verify the effect of false sharing and cache padding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance concurrency golang false sharing cache padding

Written by

Go Programming World

Mobile version of tech blog https://jianghushinian.cn/, covering Golang, Docker, Kubernetes and beyond.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.