Fundamentals 16 min read

Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency

This article explains the purpose of CPU cache memory, the principles of temporal and spatial locality, the multi‑level cache architecture, the MESI cache‑coherence protocol for multi‑core processors, and the optimizations such as store buffers and memory barriers that address performance and consistency challenges.

Open Source Linux
Open Source Linux
Open Source Linux
Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency

CPU Cache Memory

Why CPUs Need Cache

Guided by Moore's Law, CPU performance doubles roughly every 18 months, far outpacing the development of main memory and storage, which creates a speed mismatch between the CPU and data access. To bridge this gap, CPU manufacturers embed a small amount of high‑speed cache inside the processor.

CPU accesses to memory—whether fetching data or instructions—tend to cluster in contiguous regions, a phenomenon known as the locality principle.

Temporal Locality : If a data item is accessed, it is likely to be accessed again soon.

Examples include loops, recursion, and repeated method calls.

Spatial Locality : If a memory location is accessed, nearby locations are likely to be accessed soon.

Examples include sequential code execution, consecutive object allocations, and arrays.

CPU Execution Flow with Cache

Program and data are loaded into main memory.

Instructions and data are loaded into the CPU cache.

The CPU executes instructions and writes results back to the cache.

Cache data is eventually written back to main memory.

Multi‑Level Cache Structures

Because a single‑level cache cannot keep up with CPU execution speed, manufacturers introduced multi‑level cache hierarchies.

MESI Cache‑Coherence Protocol for Multi‑Core CPUs

In multi‑core CPUs each core has its own L1 cache. The MESI protocol ensures that cached copies of the same memory line remain consistent.

MESI States

MESI is an acronym for the four possible states of a cache line, represented with two bits:

M (Modified) : The line is valid, modified, and differs from main memory; it exists only in this cache.

E (Exclusive) : The line is valid, matches main memory, and resides only in this cache.

S (Shared) : The line is valid, matches main memory, and may be present in multiple caches.

I (Invalid) : The line is not valid.

Note: The M and E states are always precise, while the S state may become stale.

State Transitions

The following table (originally a HTML table) summarizes the actions that trigger state changes for each cache involved (local cache, triggering cache, other caches). For brevity, only the most common transitions are described:

M State : Remains M in the local cache; other caches become I.

E State : Local and triggering caches stay E; others become I.

S State : All caches can be S; transitions to I occur when a write is performed.

I State : Can transition to S or E when another cache reads or writes the line.

Example Scenario

Assume cache 1 holds variable x = 0 in the S (shared) state. Other caches (cache 2, cache 3) may hold the same line in S or transition it to I.

Multi‑Core Cache Cooperation

Single‑Core Read

CPU A issues a read for x, fetches it from main memory via the bus, and the cache line becomes E (exclusive).

Dual‑Core Read

CPU A reads x and sets its line to E. CPU B then reads x; CPU A detects the conflict and both caches transition the line to S (shared).

Modifying Data

CPU A modifies x, moving its line to M and invalidating the copy in CPU B (I). After the write, CPU A’s line is M.

Synchronizing Data

CPU B requests x. CPU A writes back the modified data, changing its line to E, and both caches end up in the S state.

MESI Optimizations and Their Challenges

Coherence messages incur latency; when a cache changes state, other caches must acknowledge, potentially blocking the processor and hurting performance.

Store Buffers

To avoid stalls, processors use store buffers: writes are placed in the buffer and the CPU continues execution. The actual memory update occurs only after all invalidate acknowledgments are received.

Risks of Store Buffers

The CPU may read a value from the store buffer before it is committed (solved by store forwarding).

There is no guarantee on when the buffered write will complete.

value = 3;

void exeToCPUA(){
  value = 10;
  isFinsh = true;
}

void exeToCPUB(){
  if(isFinsh){
    // value should be 10?
    assert value == 10;
  }
}

If CPU A has finished in the E state while value is still Invalid, CPU B might see finished == true but read an outdated value, leading to reordering effects.

Hardware Memory Model

Invalidation requests must be acknowledged immediately, but actual invalidation can be deferred to a special queue. The processor does not send further messages to a cache line until the invalidation is processed.

These constraints are expressed with memory barriers:

Store Memory Barrier (SMB) forces the processor to complete all pending store‑buffer operations before any subsequent instructions are executed.
Load Memory Barrier (LMB) forces the processor to complete all pending invalidations before any subsequent loads.
void executedOnCpu0(){
    value = 10;
    // Ensure all store‑buffered writes are performed before proceeding.
    storeMemoryBarrier();
    finished = true;
}

void executedOnCpu1(){
    while(!finished);
    // Ensure all invalidations are processed before the load.
    loadMemoryBarrier();
    assert value == 10;
}

With these barriers, the program becomes safe and behaves as intended.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory HierarchyCPU cachecache coherenceMESIstore buffer
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.