Why Do CPUs Need Cache? A Deep Dive into Cache Mechanisms and Consistency
This article explains the purpose of CPU caches, their classification, placement and replacement strategies, write policies, and coherence protocols, providing a comprehensive overview of cache concepts essential for modern computer architecture.
You can easily find online that many internet companies love to ask about the LRU cache mechanism in interviews, and it has become a hot topic.
Today we share a thorough technical article about Cache, covering virtually all knowledge points related to caches.
The diagrams are taken from the classic book Computer Architecture: A Quantitative Approach , which is highly recommended.
1. Why Do We Need Cache
1.1 Why Cache Is Needed
CPU performance has improved dramatically over time, while DRAM memory speed has not kept pace, creating a gap where storage limits computation.
Capacity and speed cannot be achieved simultaneously.
We solve this by exploiting data access patterns, i.e., locality.
Consider the following code:
for (j = 0; j < 100; j = j + 1)
for (i = 0; i < 5000; i = i + 1)
x[i][j] = 2 * x[i][j];Because the loops access data that are close in memory, the data exhibit locality.
In professional terms, the data have locality.
By placing such data in a small, fast storage (cache), the CPU can access them quickly.
1.2 Cache in Real Systems
The system storage hierarchy includes CPU registers, L1/L2/L3 caches, DRAM, and disk.
Data access proceeds from registers → L1 → L2 → L3 → DRAM → disk.
Smaller capacity yields higher speed.
CPU and cache transfer words, while cache to main memory transfers blocks (≈64 bytes).
1.3 Cache Classification
By data type: I‑Cache (instructions) and D‑Cache (data). D‑Cache can be written back; I‑Cache is read‑only.
By size: small cache (< 4 KB, typically L1) and large cache (> 4 KB, typically L2/L3).
By location: Inner cache (part of CPU micro‑architecture) and outer cache (outside CPU).
By data relationship: inclusive vs. exclusive cache.
2. Cache Working Principle
Four key questions need to be answered:
How is data placed?
How is data looked up?
How is data replaced?
How are write operations handled?
2.1 Data Placement
Assume main memory has 32 blocks and the cache has 8 lines. To place block 12, three methods exist:
Fully associative – any line.
Direct mapped – a specific line (e.g., 12 mod 8).
Set associative – one of a few lines (e.g., 2‑way set).
2.2 Data Lookup
Addresses are byte‑addressed, but cache transfers blocks. The low bits are block offset; some bits select the set; the tag is compared within the set. If the tag matches, the data is in cache.
2.3 Data Replacement
Random replacement.
Least Recently Used (LRU).
First‑In‑First‑Out (FIFO).
2.4 Write Policies
Write‑through – write to cache and main memory simultaneously.
Write‑back – write to cache; write to main memory only when the line is evicted.
Write‑queue – combine write‑through and write‑back using a buffer.
3. Cache Coherence
In multi‑core systems, cores may have stale copies of data, leading to errors. Coherence ensures correct shared data.
Two main strategies:
Listen‑based: all caches monitor writes and either update all copies (write‑update) or invalidate others (write‑invalidate).
Directory‑based: a central directory tracks which caches hold each block. Common protocols are SI, MSI, and MESI. The MESI protocol defines four states (Modified, Shared, Exclusive, Invalid) and their transitions.
4. Summary
Cache plays a crucial role in computer architecture. This article covered the most important concepts; further details can be explored as needed.
Author: 桔里猫 Source: https://zhuanlan.zhihu.com/p/386919471
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
