Why Caches Matter: A Deep Dive into CPU Memory Hierarchy and Consistency
This article provides a comprehensive overview of CPU caches, covering why they are needed, their classification, placement and lookup mechanisms, replacement and write policies, and coherence protocols such as MESI, illustrating each concept with diagrams and code examples.
1. Why Cache is Needed
We start with a diagram that shows why a cache is required.
CPU performance has improved dramatically, while DRAM memory performance lags, creating a bottleneck where storage limits computation.
Because capacity and speed cannot be simultaneously optimized, we exploit data locality.
for (j = 0; j < 100; j = j + 1)
for (i = 0; i < 5000; i = i + 1)
x[i][j] = 2 * x[i][j];The loops access data with spatial locality, allowing us to place this data in a small, fast storage.
Cache provides high‑speed storage for the CPU by leveraging data locality.
1.2 Cache in Real Systems
A typical system storage hierarchy includes CPU registers, L1/L2/L3 caches, DRAM, and disk.
Data is looked up first in registers, then L1, L2, etc., finally reaching the disk; smaller capacity yields faster access.
1.3 Cache Classification
By data type: I‑Cache (instructions) and D‑Cache (data). D‑Cache is writable, I‑Cache is read‑only.
By size: small (<4 KB) used for L1, large (>4 KB) for L2 and beyond.
By location: Inner Cache (part of CPU micro‑architecture) vs. Outer Cache.
By inclusion: Inclusive vs. Exclusive caches.
2. Cache Working Principles
Four key questions: how data is placed, looked up, replaced, and handled on writes.
2.1 Data Placement
Example: 32 memory blocks, 8 cache lines. To store block 12 we can use:
Fully associative – any line.
Direct‑mapped – line 12 mod 8.
Set‑associative – limited set of lines.
More associativity reduces miss rate but increases comparison circuitry.
2.2 Data Lookup
Addresses are byte‑addressed, but cache transfers occur in blocks (≈64 B). The tag, index, and block offset are used to find data; in a 2‑way set‑associative cache tags are compared in parallel.
2.3 Cache Replacement
Random replacement.
Least Recently Used (LRU).
First‑In‑First‑Out (FIFO).
Replacement policies are chosen based on workload.
2.4 Write Policies
Write‑through: write to cache and main memory simultaneously.
Write‑back: write to cache, update main memory on eviction.
Write‑buffer: combine write‑through and write‑back using a buffer.
3. Cache Coherence
In multi‑core systems, caches can become inconsistent when one core updates data that another core has cached.
Two main coherence strategies:
Listener‑based
All caches monitor writes; either update other caches (write‑update) or invalidate their copies (write‑invalidate).
Directory‑based
The memory maintains a directory of which caches hold each block, supporting protocols such as SI, MSI, and MESI.
The MESI protocol defines four states (Modified, Shared, Exclusive, Invalid) and transitions based on reads and writes.
4. Summary
Cache plays a crucial role in computer architecture; this article covered its concepts, classifications, operation, replacement, write policies, and coherence mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
