Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency
This article explains the purpose of CPU cache memory, the principles of temporal and spatial locality, the multi‑level cache architecture, the MESI cache‑coherence protocol for multi‑core processors, and the optimizations such as store buffers and memory barriers that address performance and consistency challenges.
CPU Cache Memory
Why CPUs Need Cache
Guided by Moore's Law, CPU performance doubles roughly every 18 months, far outpacing the development of main memory and storage, which creates a speed mismatch between the CPU and data access. To bridge this gap, CPU manufacturers embed a small amount of high‑speed cache inside the processor.
CPU accesses to memory—whether fetching data or instructions—tend to cluster in contiguous regions, a phenomenon known as the locality principle.
Temporal Locality : If a data item is accessed, it is likely to be accessed again soon.
Examples include loops, recursion, and repeated method calls.
Spatial Locality : If a memory location is accessed, nearby locations are likely to be accessed soon.
Examples include sequential code execution, consecutive object allocations, and arrays.
CPU Execution Flow with Cache
Program and data are loaded into main memory.
Instructions and data are loaded into the CPU cache.
The CPU executes instructions and writes results back to the cache.
Cache data is eventually written back to main memory.
Multi‑Level Cache Structures
Because a single‑level cache cannot keep up with CPU execution speed, manufacturers introduced multi‑level cache hierarchies.
MESI Cache‑Coherence Protocol for Multi‑Core CPUs
In multi‑core CPUs each core has its own L1 cache. The MESI protocol ensures that cached copies of the same memory line remain consistent.
MESI States
MESI is an acronym for the four possible states of a cache line, represented with two bits:
M (Modified) : The line is valid, modified, and differs from main memory; it exists only in this cache.
E (Exclusive) : The line is valid, matches main memory, and resides only in this cache.
S (Shared) : The line is valid, matches main memory, and may be present in multiple caches.
I (Invalid) : The line is not valid.
Note: The M and E states are always precise, while the S state may become stale.
State Transitions
The following table (originally a HTML table) summarizes the actions that trigger state changes for each cache involved (local cache, triggering cache, other caches). For brevity, only the most common transitions are described:
M State : Remains M in the local cache; other caches become I.
E State : Local and triggering caches stay E; others become I.
S State : All caches can be S; transitions to I occur when a write is performed.
I State : Can transition to S or E when another cache reads or writes the line.
Example Scenario
Assume cache 1 holds variable x = 0 in the S (shared) state. Other caches (cache 2, cache 3) may hold the same line in S or transition it to I.
Multi‑Core Cache Cooperation
Single‑Core Read
CPU A issues a read for x, fetches it from main memory via the bus, and the cache line becomes E (exclusive).
Dual‑Core Read
CPU A reads x and sets its line to E. CPU B then reads x; CPU A detects the conflict and both caches transition the line to S (shared).
Modifying Data
CPU A modifies x, moving its line to M and invalidating the copy in CPU B (I). After the write, CPU A’s line is M.
Synchronizing Data
CPU B requests x. CPU A writes back the modified data, changing its line to E, and both caches end up in the S state.
MESI Optimizations and Their Challenges
Coherence messages incur latency; when a cache changes state, other caches must acknowledge, potentially blocking the processor and hurting performance.
Store Buffers
To avoid stalls, processors use store buffers: writes are placed in the buffer and the CPU continues execution. The actual memory update occurs only after all invalidate acknowledgments are received.
Risks of Store Buffers
The CPU may read a value from the store buffer before it is committed (solved by store forwarding).
There is no guarantee on when the buffered write will complete.
value = 3;
void exeToCPUA(){
value = 10;
isFinsh = true;
}
void exeToCPUB(){
if(isFinsh){
// value should be 10?
assert value == 10;
}
}If CPU A has finished in the E state while value is still Invalid, CPU B might see finished == true but read an outdated value, leading to reordering effects.
Hardware Memory Model
Invalidation requests must be acknowledged immediately, but actual invalidation can be deferred to a special queue. The processor does not send further messages to a cache line until the invalidation is processed.
These constraints are expressed with memory barriers:
Store Memory Barrier (SMB) forces the processor to complete all pending store‑buffer operations before any subsequent instructions are executed.
Load Memory Barrier (LMB) forces the processor to complete all pending invalidations before any subsequent loads.
void executedOnCpu0(){
value = 10;
// Ensure all store‑buffered writes are performed before proceeding.
storeMemoryBarrier();
finished = true;
}
void executedOnCpu1(){
while(!finished);
// Ensure all invalidations are processed before the load.
loadMemoryBarrier();
assert value == 10;
}With these barriers, the program becomes safe and behaves as intended.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
