Understanding Linux Memory Barriers: Concepts, Types, and Implementation
This article explains why modern multi‑core CPUs need memory barriers, describes the different kinds of barriers (full, read, write), shows how they are implemented in the Linux kernel and hardware, and illustrates their use in multithreaded and cache‑coherent programming.
1. Introduction to Memory Barriers
In today’s computers multi‑core processors are ubiquitous, but the parallel execution of cores can cause chaotic memory accesses. Linux uses memory barriers (also called memory fences) to enforce a well‑defined order of memory operations across cores, ensuring program correctness.
1.1 Overview of Memory Barriers
A memory barrier is a synchronization primitive that guarantees that all memory reads/writes before the barrier are completed before any reads/writes after the barrier are performed. The three main types are:
Full memory barrier – ensures both reads and writes are ordered.
Read memory barrier – only orders read operations.
Write memory barrier – only orders write operations.
In the Linux kernel (x86) they are defined as:
#define mb() asm volatile("mfence":::"memory")
#define rmb() asm volatile("lfence":::"memory")
#define wmb() asm volatile("sfence" ::: "memory")Think of a memory barrier like boiling water before brewing tea: the boiling step must finish before you cut the tea leaves, otherwise the result is unsatisfactory.
Hardware‑level barriers are divided into Load (read) barriers and Store (write) barriers.
Two Main Functions of Memory Barriers
Prevent reordering of instructions on both sides of the barrier.
Force dirty data in write buffers or caches to be flushed to main memory, invalidating stale cache lines.
2. Why Are Memory Barriers Needed?
Modern CPUs employ caches and out‑of‑order execution to improve performance. This leads to situations where the logical order of memory accesses in the source code is not the order observed at runtime, causing data races and incorrect results in multithreaded programs.
2.1 Background of Memory Reordering
Early in‑order processors fetched, decoded, executed, and wrote back instructions strictly in program order. Out‑of‑order processors, however, place instructions into a queue, execute them as soon as their operands are ready, and may retire them in a different order, creating the appearance of “reordered” execution.
2.2 Understanding Memory Barriers
Consider the following simple code:
x = r;
y = 1;On many CPUs the assignment to y may be performed before the load of x , illustrating a classic read‑after‑write reordering.
Both compiler optimizations and CPU runtime optimizations can introduce such reorderings. In single‑threaded code this is harmless, but in multithreaded code the order of memory accesses often determines correctness.
Example of a data‑race scenario:
// thread 1
while(!ok);
do(x);
// thread 2
x = 42;
ok = 1;If the write to x is reordered after the write to ok , thread 1 may observe ok == 1 while still seeing an old value of x . Inserting a memory barrier before the write to x prevents this.
3. Types of Memory Barriers
3.1 Full Barrier (mfence)
A full barrier prevents any reordering of reads or writes across it. Example:
// thread 1
x = 1; // write A
mfence(); // full barrier
y = 2; // write B
// thread 2
if (y == 2) {
assert(x == 1);
}3.2 Read Barrier (lfence)
A read barrier guarantees that all reads before the barrier complete before any reads after it. Example:
int a = shared1; // read A
lfence(); // read barrier
int b = shared2; // read B3.3 Write Barrier (sfence)
A write barrier ensures that all writes before the barrier are visible before any later writes. Example:
shared1 = 10; // write C
sfence(); // write barrier
shared2 = 20; // write D4. Implementation Principles
4.1 Memory‑Consistency Models
Strong consistency models enforce program order for all loads and stores, eliminating the need for barriers but hurting performance. Weak consistency models (e.g., TSO on x86) allow certain reorderings such as Store‑Load, making barriers necessary to regain ordering guarantees.
4.2 Cache‑Coherence Protocol (MESI)
Each cache line can be in one of four states: Modified (M), Exclusive (E), Shared (S), or Invalid (I). Memory barriers interact with MESI by forcing pending writes to be flushed (transitioning M→S) and by issuing invalidate messages so other cores see the latest data.
4.3 Instruction Sequences
On x86 the barrier instructions are:
mfence – full barrier
lfence – read barrier
sfence – write barrier
These instructions serialize the memory subsystem, ensuring that earlier memory operations are globally visible before later ones proceed.
5. Application Scenarios
5.1 Multithreaded Programming
When two threads share variables x and y , inserting a full barrier after writing x guarantees that a second thread seeing y == 2 will also see x == 1 :
// thread 1
x = 1;
mfence();
y = 2;
// thread 2
if (y == 2) {
lfence();
assert(x == 1);
}5.2 Shared‑Memory Producer/Consumer
A producer writes data to a buffer, then sets a flag. A write barrier after the data store ensures the flag becomes visible only after the data is committed. The consumer inserts a read barrier before reading the buffer after seeing the flag.
// producer
buffer = 10;
sfence();
flag = 1;
// consumer
if (flag == 1) {
lfence();
assert(buffer == 10);
}5.3 Cache Consistency Across Cores
Core A modifies a shared variable and issues a full mfence to push the update to main memory and invalidate other cores’ stale copies. Core B also executes an mfence before reading, guaranteeing it sees the latest value.
// core A
x = 1;
mfence();
// core B
mfence();
assert(x == 1);6. Summary
Memory barriers are essential primitives that bridge the gap between high‑performance out‑of‑order hardware and the strict ordering requirements of concurrent software. By forcing ordering and visibility of memory operations, they enable correct synchronization in multithreaded kernels, user‑space libraries, and low‑level system code.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.