Fundamentals 11 min read

Why CPU Cache Matters: Understanding Cache Types and Optimizing Code

This article explains the purpose of CPU caches, describes the three cache levels and their internal structures—including direct‑mapped, set‑associative, and fully‑associative designs—and shows how cache‑aware programming can dramatically improve code performance.

Open Source Linux
Open Source Linux
Open Source Linux
Why CPU Cache Matters: Understanding Cache Types and Optimizing Code

Why a CPU Cache Is Needed

Modern computer architecture includes CPU registers, CPU cache, main memory, and disk storage. Speed decreases from registers to disk while price and capacity increase, making caches essential for bridging the speed gap between the fast CPU and slower memory.

Typical access cycles: L1 cache ~4 cycles, L2 ~11, L3 ~24, main memory ~167.

Cache Hierarchy and Internal Structure

CPUs usually have three cache levels (L1, L2, L3). L1 is split into a data cache and an instruction cache. If data is not found in L1, the CPU checks L2, then L3, then main memory, and finally disk.

Cache hierarchy diagram
Cache hierarchy diagram

Cache Line and Types

A cache line is the smallest block transferred between memory and cache, exploiting spatial locality. Cache designs include:

Direct‑mapped cache : each memory address maps to exactly one cache line using Tag, Index, and Offset fields.

Set‑associative (multi‑way) cache : an index points to a set containing multiple lines; the tag is compared against each line in the set.

Fully associative cache : all lines belong to a single set, requiring a tag comparison with every line.

Direct‑Mapped Cache

The address is divided into Tag, Index, and Offset. The Index selects a cache line, the Tag verifies a hit, and the Offset selects the data within the line. An example with 8 lines shows cache thrashing when repeatedly accessing addresses 0x00, 0x40, and 0x00.

Direct‑mapped cache mapping
Direct‑mapped cache mapping

Set‑Associative Cache

In a two‑way set‑associative cache, the Index points to two possible lines. The CPU compares the Tag with both lines and selects the matching one. An example with an address whose Tag is 0x19 and Index is 1 demonstrates a hit on the second line.

Two‑way set‑associative mapping
Two‑way set‑associative mapping

When both lines are occupied, replacement policies such as PLRU, NRU, FIFO, or round‑robin are used.

Fully Associative Cache

All cache lines belong to a single set, so no Index field is needed. The CPU must compare the Tag with every line, eliminating cache thrashing at the cost of higher hardware complexity.

Fully associative cache diagram
Fully associative cache diagram

Writing Cache‑Friendly Code

Consider summing a 1024×1024 matrix. Two traversal orders are compared: row‑major and column‑major.

const int row = 1024;
const int col = 1024;
int matrix[row][col];
// Row‑major traversal
int sum_row = 0;
for (int r = 0; r < row; r++) {
    for (int c = 0; c < col; c++) {
        sum_row += matrix[r][c];
    }
}
// Column‑major traversal
int sum_col = 0;
for (int c = 0; c < col; c++) {
    for (int r = 0; r < row; r++) {
        sum_col += matrix[r][c];
    }
}

Benchmark results show row‑major traversal completing in about 2 ms, while column‑major takes about 8 ms.

The speed difference is due to spatial locality: row‑major accesses consecutive elements that reside in the same cache line, leading to high cache‑hit rates. Column‑major accesses jump to a new row each step, causing frequent cache misses.

Cache line loading illustration
Cache line loading illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationMemory HierarchyCPU cachecache architecturedirect-mappedFully AssociativeSet Associative
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.