Why CPU Cache Matters: Understanding Cache Types and Optimizing Code Performance
This article explains the purpose of CPU caches, describes their hierarchical structure and internal organization—including direct‑mapped, set‑associative, and fully‑associative designs—and shows how cache‑aware programming can dramatically speed up matrix operations.
Why a Cache Is Needed
Modern computers use a hierarchy of storage components—CPU registers, CPU caches, main memory, and disks—where speed decreases and capacity increases from top to bottom. Because accessing main memory is far slower than the CPU, a small, fast cache is placed between the CPU and memory to bridge the speed gap.
Cache Hierarchy and Speed
Typical CPUs have three cache levels (L1, L2, L3). L1 is split into data and instruction caches and is private to each core; L2 is also private; L3 is shared among all cores. A table of typical latency (in clock cycles) shows L1 ≈ 4, L2 ≈ 11, L3 ≈ 24, and main memory ≈ 167 cycles.
Cache Lines and Locality
When the CPU reads memory, it fetches an entire cache line (a block of contiguous bytes) rather than a single byte. This exploits spatial locality: after accessing one address, nearby addresses are likely to be accessed soon, so they are pre‑loaded into the cache.
Direct‑Mapped Cache
A direct‑mapped cache maps each memory address to exactly one cache line using three fields: Tag , Index , and Offset . The Index selects a line, the Tag is stored with the line, and the Offset selects the byte within the line. A cache hit occurs when the Tag in the selected line matches the address Tag.
Example: with 8 cache lines, addresses 0x00, 0x40, and 0x80 share the same Index, so they map to the same line. Accessing 0x00, then 0x40, then 0x00 again causes a miss each time because each new address overwrites the previous line’s data—this phenomenon is called cache thrashing .
Set‑Associative (Multi‑Way) Cache
Set‑associative caches allow each Index to refer to multiple lines (ways). For a 2‑way set‑associative cache with 8 lines, the Index points to two possible lines; the cache checks both Tags and selects the matching line if present. This reduces thrashing compared to direct‑mapped caches.
When both lines in a set are occupied, replacement policies such as PLRU, NRU, FIFO, or round‑robin decide which line to evict.
Fully‑Associative Cache
In a fully‑associative cache, any memory block can be placed in any cache line; there is no Index field. On a lookup, the Tag of every line is compared to the address Tag. This eliminates thrashing entirely but requires the most hardware.
Writing Cache‑Friendly Code
Access patterns that respect cache line boundaries run much faster. The article demonstrates this with a 1024 × 1024 matrix sum:
const int row = 1024;
const int col = 1024;
int matrix[row][col];
// Row‑major traversal (fast)
int sum_row = 0;
for (int r = 0; r < row; ++r)
for (int c = 0; c < col; ++c)
sum_row += matrix[r][c];
// Column‑major traversal (slow)
int sum_col = 0;
for (int c = 0; c < col; ++c)
for (int r = 0; r < row; ++r)
sum_col += matrix[r][c];Timing tests show the row‑major version completes in about 2 ms, while the column‑major version takes about 8 ms. The speed difference arises because row‑major accesses stay within the same cache line, achieving high hit rates, whereas column‑major accesses jump to a new line for each element, causing many cache misses.
Illustrations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
