Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent
Modern CPUs use multi‑level cache memory to bridge the speed gap with main memory, relying on temporal and spatial locality, and employ the MESI protocol with states M, E, S, I to maintain coherence across cores, while techniques like store buffers and memory barriers mitigate latency and ordering issues.
CPU Cache Memory
Why CPUs Need Cache
CPU performance grows roughly every 18 months (Moore’s law) while main memory and storage improve much more slowly, creating a speed mismatch. To keep the processor fed with data, a small amount of fast cache memory is embedded inside the CPU.
Temporal Locality : If a piece of information is accessed, it is likely to be accessed again soon (e.g., loops, recursion, repeated method calls).
Spatial Locality : If a memory location is accessed, nearby locations are likely to be accessed next (e.g., sequential code, consecutive objects, arrays).
Cache‑Enabled CPU Execution Flow
Program code and data are loaded into main memory.
Instructions and data are fetched into the CPU’s cache.
The CPU executes instructions and writes results back to the cache.
Cache contents are eventually written back to main memory.
Multi‑Level Cache Architecture
Because a single‑level cache cannot keep up with the CPU’s execution speed, modern processors employ a hierarchy of caches (L1, L2, L3, …) with increasing size and latency.
MESI Cache‑Coherency Protocol for Multi‑Core CPUs
MESI States
M (Modified) : The line is valid, modified, differs from main memory, and exists only in this cache.
E (Exclusive) : The line is valid, identical to main memory, and resides only in this cache.
S (Shared) : The line is valid, identical to main memory, and may be present in multiple caches.
I (Invalid) : The line is not valid.
State Transitions
The protocol defines transitions based on four events:
Local read – the cache reads its own line.
Local write – the cache writes to its own line.
Remote read – another cache reads this line.
Remote write – another cache writes to this line.
Caches are classified as:
Local cache – the cache belonging to the CPU that initiates the operation.
Trigger cache – the cache that actually generates the read/write event.
Other caches – all remaining caches that may hold copies of the same line.
Example Scenarios
Single‑Core Read
CPU A reads x from main memory; the cache line becomes E (exclusive) in its cache.
Dual‑Core Read
CPU A first loads x into its cache (E state). When CPU B subsequently reads x, both caches transition to S (shared) state.
Modifying Data
CPU A writes to x, moving its line to M (modified) and invalidating the copy in CPU B (I state).
Synchronizing Data
When CPU B later reads x, CPU A flushes the modified line to main memory (E state) and both caches end up in S (shared) state.
MESI Optimizations and Their Challenges
Store Buffers
Store buffers let a processor write values to a temporary buffer and continue execution while the write is pending. The pending write is committed only after all invalidate acknowledgments are received.
Two risks arise:
Store‑forwarding: the processor may read a value from the store buffer before it is committed.
Uncertain commit time: there is no guarantee when the buffered write becomes visible to other CPUs.
value = 3;
void exeToCPUA(){
value = 10;
isFinsh = true;
}
void exeToCPUB(){
if(isFinsh){
// value is expected to be 10
assert value == 10;
}
}Hardware Memory Model and Memory Barriers
Invalidate queues hold pending invalidation requests until the processor can safely execute them. Memory barriers enforce ordering:
Store Memory Barrier (SMB) : ensures that all pending store‑buffer writes are committed before any subsequent instructions are executed.
Load Memory Barrier (LMB) : ensures that all pending invalidations are processed before any subsequent loads.
void executedOnCpu0(){
value = 10;
storeMemoryBarrier(); // ensure stores complete
finished = true;
}
void executedOnCpu1(){
while(!finished);
loadMemoryBarrier(); // ensure invalidations processed
assert value == 10;
}With store buffers and memory barriers, visibility and ordering of memory operations across multiple cores are guaranteed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
