Fundamentals 13 min read

Why Do CPUs Need Cache? A Deep Dive into Cache Mechanisms and Consistency

This article explains the purpose of CPU caches, their classification, placement and replacement strategies, write policies, and coherence protocols, providing a comprehensive overview of cache concepts essential for modern computer architecture.

Open Source Linux

Sep 4, 2021

Why Do CPUs Need Cache? A Deep Dive into Cache Mechanisms and Consistency

You can easily find online that many internet companies love to ask about the LRU cache mechanism in interviews, and it has become a hot topic.

Today we share a thorough technical article about Cache, covering virtually all knowledge points related to caches.

The diagrams are taken from the classic book Computer Architecture: A Quantitative Approach , which is highly recommended.

1. Why Do We Need Cache

1.1 Why Cache Is Needed

CPU performance has improved dramatically over time, while DRAM memory speed has not kept pace, creating a gap where storage limits computation.

Capacity and speed cannot be achieved simultaneously.

We solve this by exploiting data access patterns, i.e., locality.

Consider the following code:

for (j = 0; j < 100; j = j + 1)
    for (i = 0; i < 5000; i = i + 1)
        x[i][j] = 2 * x[i][j];

Because the loops access data that are close in memory, the data exhibit locality.

In professional terms, the data have locality.

By placing such data in a small, fast storage (cache), the CPU can access them quickly.

1.2 Cache in Real Systems

The system storage hierarchy includes CPU registers, L1/L2/L3 caches, DRAM, and disk.

Data access proceeds from registers → L1 → L2 → L3 → DRAM → disk.

Smaller capacity yields higher speed.

CPU and cache transfer words, while cache to main memory transfers blocks (≈64 bytes).

1.3 Cache Classification

By data type: I‑Cache (instructions) and D‑Cache (data). D‑Cache can be written back; I‑Cache is read‑only.

By size: small cache (< 4 KB, typically L1) and large cache (> 4 KB, typically L2/L3).

By location: Inner cache (part of CPU micro‑architecture) and outer cache (outside CPU).

By data relationship: inclusive vs. exclusive cache.

2. Cache Working Principle

Four key questions need to be answered:

How is data placed?

How is data looked up?

How is data replaced?

How are write operations handled?

2.1 Data Placement

Assume main memory has 32 blocks and the cache has 8 lines. To place block 12, three methods exist:

Fully associative – any line.

Direct mapped – a specific line (e.g., 12 mod 8).

Set associative – one of a few lines (e.g., 2‑way set).

2.2 Data Lookup

Addresses are byte‑addressed, but cache transfers blocks. The low bits are block offset; some bits select the set; the tag is compared within the set. If the tag matches, the data is in cache.

2.3 Data Replacement

Random replacement.

Least Recently Used (LRU).

First‑In‑First‑Out (FIFO).

2.4 Write Policies

Write‑through – write to cache and main memory simultaneously.

Write‑back – write to cache; write to main memory only when the line is evicted.

Write‑queue – combine write‑through and write‑back using a buffer.

3. Cache Coherence

In multi‑core systems, cores may have stale copies of data, leading to errors. Coherence ensures correct shared data.

Two main strategies:

Listen‑based: all caches monitor writes and either update all copies (write‑update) or invalidate others (write‑invalidate).

Directory‑based: a central directory tracks which caches hold each block. Common protocols are SI, MSI, and MESI. The MESI protocol defines four states (Modified, Shared, Exclusive, Invalid) and their transitions.

4. Summary

Cache plays a crucial role in computer architecture. This article covered the most important concepts; further details can be explored as needed.

Author: 桔里猫 Source: https://zhuanlan.zhihu.com/p/386919471

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Cache LRU computer architecture Memory Hierarchy cache coherence

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.