Fundamentals 71 min read

Understanding Linux Memory Barriers: Types, Usage, and Implementation

This article provides a comprehensive overview of Linux memory barriers, explaining why they are needed for correct ordering of memory operations on modern multi‑core CPUs, describing the different barrier types (read, write, full), their implementation in the kernel and Java, and illustrating their use in synchronization primitives and lock‑free data structures with code examples.

Deepin Linux

Nov 12, 2024

Understanding Linux Memory Barriers: Types, Usage, and Implementation

1. Overview of Memory Barriers

Modern CPUs execute instructions out of order to improve performance, which can break the expected order of memory accesses in multithreaded programs. A memory barrier (or fence) is a special instruction that creates a synchronization point, ensuring that all memory operations before the barrier are completed before any operation after the barrier begins.

Types of Barriers

Linux provides three basic barrier macros:

#define mb()    asm volatile("mfence":::"memory")
#define rmb()   asm volatile("lfence":::"memory")
#define wmb()   asm volatile("sfence" ::: "memory")

Full barrier (mb) enforces ordering of both loads and stores; read barrier (rmb) orders loads; write barrier (wmb) orders stores.

2. Memory Barriers in Different Contexts

In Java, the volatile keyword inserts a StoreStore barrier before each write and a StoreLoad barrier after each write, as well as a LoadLoad barrier before each read and a LoadStore barrier after each read, giving volatile variables lock‑like visibility guarantees.

For final fields, the compiler inserts StoreStore barriers before the constructor finishes and LoadLoad barriers before the field is read, ensuring the field is fully initialized before any thread can see it.

3. Compiler vs. Runtime Reordering

Compilers may reorder instructions for optimization. For example, compiling with -O2 can change the order of assignments:

int x, y, r;
void f(){
    x = r;
    y = 1;
}
// Optimized with -O2
movl    r(%rip), %eax
movl    $1, y(%rip)
movl    %eax, x(%rip)

To prevent such reordering, a compiler barrier #define barrier() __asm__ __volatile__("": : :"memory") can be used.

4. Runtime Reordering on SMP Systems

In multi‑core (SMP) systems each CPU has its own cache. Without barriers, a write performed by one CPU may stay in its cache and be invisible to another CPU, leading to stale reads. Cache‑coherency protocols (MESI) and mechanisms such as store buffers and invalidate queues are used, but they still require explicit barriers to guarantee ordering.

Store Buffer Example

When a CPU writes, the value may first go into a store buffer and only later be committed to the cache after an invalidate handshake with other CPUs.

Invalidate Queue Example

Invalidate queues allow a CPU to acknowledge an invalidate request immediately, deferring the actual cache line invalidation, which reduces stalls caused by full store‑buffer flushes.

5. Linux Kernel Barrier Macros

The kernel defines both compiler and CPU barriers. For SMP builds the macros expand to the hardware fences; for UP builds they collapse to the compiler barrier:

#define barrier() __asm__ __volatile__("": : :"memory")
#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#endif

6. Practical Use Cases

Barriers are essential in SMP synchronization, spinlocks, atomic operations, and memory‑consistency models (Sequential Consistency, Total Store Order, etc.). They ensure that writes become visible before subsequent reads, preventing subtle bugs such as the classic "store‑load" reordering where two threads can observe stale values.

7. Example: Lock‑Free Ring Buffer (kfifo)

The Linux kernel implements a lock‑free FIFO using barriers:

unsigned int __kfifo_put(struct kfifo *fifo,
            const unsigned char *buffer, unsigned int len)
{
    unsigned int l;
    len = min(len, fifo->size - fifo->in + fifo->out);
    smp_mb();                     // ensure out is read before writing
    l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
    memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
    memcpy(fifo->buffer, buffer + l, len - l);
    smp_wmb();                    // ensure data is written before updating in
    fifo->in += len;
    return len;
}

unsigned int __kfifo_get(struct kfifo *fifo,
             unsigned char *buffer, unsigned int len)
{
    unsigned int l;
    len = min(len, fifo->in - fifo->out);
    smp_rmb();                     // ensure in is read before reading data
    l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
    memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
    memcpy(buffer + l, fifo->buffer, len - l);
    smp_mb();                      // ensure data is read before updating out
    fifo->out += len;
    return len;
}

Here smp_mb(), smp_wmb(), and smp_rmb() guarantee the correct ordering of reads, writes, and index updates between a single producer and a single consumer.

8. When to Consider Barriers

Any code that performs raw loads/stores on shared data without using higher‑level atomic primitives (e.g., GCC built‑ins, libatomic) must explicitly insert the appropriate memory barriers to avoid reordering bugs on SMP systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

concurrency Synchronization Linux kernel CPU architecture memory barriers lock‑free data structures

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.