Fundamentals 41 min read

Understanding Linux Memory Barriers: Concepts, Types, and Implementation

This article explains why modern multi‑core CPUs need memory barriers, describes the different kinds of barriers (full, read, write), shows how they are implemented in the Linux kernel and hardware, and illustrates their use in multithreaded and cache‑coherent programming.

Deepin Linux

Mar 27, 2025

Understanding Linux Memory Barriers: Concepts, Types, and Implementation

1. Introduction to Memory Barriers

In today’s computers multi‑core processors are ubiquitous, but the parallel execution of cores can cause chaotic memory accesses. Linux uses memory barriers (also called memory fences) to enforce a well‑defined order of memory operations across cores, ensuring program correctness.

1.1 Overview of Memory Barriers

A memory barrier is a synchronization primitive that guarantees that all memory reads/writes before the barrier are completed before any reads/writes after the barrier are performed. The three main types are:

Full memory barrier – ensures both reads and writes are ordered.

Read memory barrier – only orders read operations.

Write memory barrier – only orders write operations.

In the Linux kernel (x86) they are defined as:

#define mb()    asm volatile("mfence":::"memory")
#define rmb()   asm volatile("lfence":::"memory")
#define wmb()   asm volatile("sfence" ::: "memory")

Think of a memory barrier like boiling water before brewing tea: the boiling step must finish before you cut the tea leaves, otherwise the result is unsatisfactory.

Hardware‑level barriers are divided into Load (read) barriers and Store (write) barriers.

Two Main Functions of Memory Barriers

Prevent reordering of instructions on both sides of the barrier.

Force dirty data in write buffers or caches to be flushed to main memory, invalidating stale cache lines.

2. Why Are Memory Barriers Needed?

Modern CPUs employ caches and out‑of‑order execution to improve performance. This leads to situations where the logical order of memory accesses in the source code is not the order observed at runtime, causing data races and incorrect results in multithreaded programs.

2.1 Background of Memory Reordering

Early in‑order processors fetched, decoded, executed, and wrote back instructions strictly in program order. Out‑of‑order processors, however, place instructions into a queue, execute them as soon as their operands are ready, and may retire them in a different order, creating the appearance of “reordered” execution.

2.2 Understanding Memory Barriers

Consider the following simple code:

x = r;
 y = 1;

On many CPUs the assignment to y may be performed before the load of x, illustrating a classic read‑after‑write reordering.

Both compiler optimizations and CPU runtime optimizations can introduce such reorderings. In single‑threaded code this is harmless, but in multithreaded code the order of memory accesses often determines correctness.

Example of a data‑race scenario:

// thread 1
while(!ok);
 do(x);

// thread 2
x = 42;
ok = 1;

If the write to x is reordered after the write to ok, thread 1 may observe ok == 1 while still seeing an old value of x. Inserting a memory barrier before the write to x prevents this.

3. Types of Memory Barriers

3.1 Full Barrier (mfence)

A full barrier prevents any reordering of reads or writes across it. Example:

// thread 1
x = 1;          // write A
mfence();       // full barrier
y = 2;          // write B

// thread 2
if (y == 2) {
    assert(x == 1);
}

3.2 Read Barrier (lfence)

A read barrier guarantees that all reads before the barrier complete before any reads after it. Example:

int a = shared1;   // read A
lfence();          // read barrier
int b = shared2;   // read B

3.3 Write Barrier (sfence)

A write barrier ensures that all writes before the barrier are visible before any later writes. Example:

shared1 = 10;   // write C
sfence();       // write barrier
shared2 = 20;   // write D

4. Implementation Principles

4.1 Memory‑Consistency Models

Strong consistency models enforce program order for all loads and stores, eliminating the need for barriers but hurting performance. Weak consistency models (e.g., TSO on x86) allow certain reorderings such as Store‑Load, making barriers necessary to regain ordering guarantees.

4.2 Cache‑Coherence Protocol (MESI)

Each cache line can be in one of four states: Modified (M), Exclusive (E), Shared (S), or Invalid (I). Memory barriers interact with MESI by forcing pending writes to be flushed (transitioning M→S) and by issuing invalidate messages so other cores see the latest data.

4.3 Instruction Sequences

On x86 the barrier instructions are: mfence – full barrier lfence – read barrier sfence – write barrier

These instructions serialize the memory subsystem, ensuring that earlier memory operations are globally visible before later ones proceed.

5. Application Scenarios

5.1 Multithreaded Programming

When two threads share variables x and y, inserting a full barrier after writing x guarantees that a second thread seeing y == 2 will also see x == 1:

// thread 1
x = 1;
mfence();
y = 2;

// thread 2
if (y == 2) {
    lfence();
    assert(x == 1);
}

5.2 Shared‑Memory Producer/Consumer

A producer writes data to a buffer, then sets a flag. A write barrier after the data store ensures the flag becomes visible only after the data is committed. The consumer inserts a read barrier before reading the buffer after seeing the flag.

// producer
buffer = 10;
sfence();
flag = 1;

// consumer
if (flag == 1) {
    lfence();
    assert(buffer == 10);
}

5.3 Cache Consistency Across Cores

Core A modifies a shared variable and issues a full mfence to push the update to main memory and invalidate other cores’ stale copies. Core B also executes an mfence before reading, guaranteeing it sees the latest value.

// core A
x = 1;
mfence();

// core B
mfence();
assert(x == 1);

6. Summary

Memory barriers are essential primitives that bridge the gap between high‑performance out‑of‑order hardware and the strict ordering requirements of concurrent software. By forcing ordering and visibility of memory operations, they enable correct synchronization in multithreaded kernels, user‑space libraries, and low‑level system code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

concurrency cache coherence CPU architecture memory barriers

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.