Fundamentals 15 min read

Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent

Modern CPUs use multi‑level cache memory to bridge the speed gap with main memory, relying on temporal and spatial locality, and employ the MESI protocol with states M, E, S, I to maintain coherence across cores, while techniques like store buffers and memory barriers mitigate latency and ordering issues.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent

CPU Cache Memory

Why CPUs Need Cache

CPU performance grows roughly every 18 months (Moore’s law) while main memory and storage improve much more slowly, creating a speed mismatch. To keep the processor fed with data, a small amount of fast cache memory is embedded inside the CPU.

Temporal Locality : If a piece of information is accessed, it is likely to be accessed again soon (e.g., loops, recursion, repeated method calls).
Spatial Locality : If a memory location is accessed, nearby locations are likely to be accessed next (e.g., sequential code, consecutive objects, arrays).

Cache‑Enabled CPU Execution Flow

Program code and data are loaded into main memory.

Instructions and data are fetched into the CPU’s cache.

The CPU executes instructions and writes results back to the cache.

Cache contents are eventually written back to main memory.

Cache execution flow diagram
Cache execution flow diagram

Multi‑Level Cache Architecture

Because a single‑level cache cannot keep up with the CPU’s execution speed, modern processors employ a hierarchy of caches (L1, L2, L3, …) with increasing size and latency.

Multi‑level cache diagram
Multi‑level cache diagram

MESI Cache‑Coherency Protocol for Multi‑Core CPUs

MESI States

M (Modified) : The line is valid, modified, differs from main memory, and exists only in this cache.

E (Exclusive) : The line is valid, identical to main memory, and resides only in this cache.

S (Shared) : The line is valid, identical to main memory, and may be present in multiple caches.

I (Invalid) : The line is not valid.

State Transitions

The protocol defines transitions based on four events:

Local read – the cache reads its own line.

Local write – the cache writes to its own line.

Remote read – another cache reads this line.

Remote write – another cache writes to this line.

Caches are classified as:

Local cache – the cache belonging to the CPU that initiates the operation.

Trigger cache – the cache that actually generates the read/write event.

Other caches – all remaining caches that may hold copies of the same line.

MESI state transition diagram
MESI state transition diagram

Example Scenarios

Single‑Core Read

CPU A reads x from main memory; the cache line becomes E (exclusive) in its cache.

Single‑core read diagram
Single‑core read diagram

Dual‑Core Read

CPU A first loads x into its cache (E state). When CPU B subsequently reads x, both caches transition to S (shared) state.

Dual‑core read diagram
Dual‑core read diagram

Modifying Data

CPU A writes to x, moving its line to M (modified) and invalidating the copy in CPU B (I state).

Modify data diagram
Modify data diagram

Synchronizing Data

When CPU B later reads x, CPU A flushes the modified line to main memory (E state) and both caches end up in S (shared) state.

Synchronize data diagram
Synchronize data diagram

MESI Optimizations and Their Challenges

Store Buffers

Store buffers let a processor write values to a temporary buffer and continue execution while the write is pending. The pending write is committed only after all invalidate acknowledgments are received.

Two risks arise:

Store‑forwarding: the processor may read a value from the store buffer before it is committed.

Uncertain commit time: there is no guarantee when the buffered write becomes visible to other CPUs.

value = 3;

void exeToCPUA(){
  value = 10;
  isFinsh = true;
}

void exeToCPUB(){
  if(isFinsh){
    // value is expected to be 10
    assert value == 10;
  }
}

Hardware Memory Model and Memory Barriers

Invalidate queues hold pending invalidation requests until the processor can safely execute them. Memory barriers enforce ordering:

Store Memory Barrier (SMB) : ensures that all pending store‑buffer writes are committed before any subsequent instructions are executed.
Load Memory Barrier (LMB) : ensures that all pending invalidations are processed before any subsequent loads.
void executedOnCpu0(){
    value = 10;
    storeMemoryBarrier(); // ensure stores complete
    finished = true;
}

void executedOnCpu1(){
    while(!finished);
    loadMemoryBarrier(); // ensure invalidations processed
    assert value == 10;
}

With store buffers and memory barriers, visibility and ordering of memory operations across multiple cores are guaranteed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CPUhardware architectureMESICache MemoryMemory Consistency
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.