Fundamentals 11 min read

Why Caches Matter: A Deep Dive into CPU Memory Hierarchy and Consistency

This article provides a comprehensive overview of CPU caches, covering why they are needed, their classification, placement and lookup mechanisms, replacement and write policies, and coherence protocols such as MESI, illustrating each concept with diagrams and code examples.

21CTO

Dec 2, 2021

Why Caches Matter: A Deep Dive into CPU Memory Hierarchy and Consistency

1. Why Cache is Needed

We start with a diagram that shows why a cache is required.

CPU performance has improved dramatically, while DRAM memory performance lags, creating a bottleneck where storage limits computation.

Because capacity and speed cannot be simultaneously optimized, we exploit data locality.

for (j = 0; j < 100; j = j + 1)
    for (i = 0; i < 5000; i = i + 1)
        x[i][j] = 2 * x[i][j];

The loops access data with spatial locality, allowing us to place this data in a small, fast storage.

Cache provides high‑speed storage for the CPU by leveraging data locality.

1.2 Cache in Real Systems

A typical system storage hierarchy includes CPU registers, L1/L2/L3 caches, DRAM, and disk.

Data is looked up first in registers, then L1, L2, etc., finally reaching the disk; smaller capacity yields faster access.

1.3 Cache Classification

By data type: I‑Cache (instructions) and D‑Cache (data). D‑Cache is writable, I‑Cache is read‑only.

By size: small (<4 KB) used for L1, large (>4 KB) for L2 and beyond.

By location: Inner Cache (part of CPU micro‑architecture) vs. Outer Cache.

By inclusion: Inclusive vs. Exclusive caches.

2. Cache Working Principles

Four key questions: how data is placed, looked up, replaced, and handled on writes.

2.1 Data Placement

Example: 32 memory blocks, 8 cache lines. To store block 12 we can use:

Fully associative – any line.

Direct‑mapped – line 12 mod 8.

Set‑associative – limited set of lines.

More associativity reduces miss rate but increases comparison circuitry.

2.2 Data Lookup

Addresses are byte‑addressed, but cache transfers occur in blocks (≈64 B). The tag, index, and block offset are used to find data; in a 2‑way set‑associative cache tags are compared in parallel.

2.3 Cache Replacement

Random replacement.

Least Recently Used (LRU).

First‑In‑First‑Out (FIFO).

Replacement policies are chosen based on workload.

2.4 Write Policies

Write‑through: write to cache and main memory simultaneously.

Write‑back: write to cache, update main memory on eviction.

Write‑buffer: combine write‑through and write‑back using a buffer.

3. Cache Coherence

In multi‑core systems, caches can become inconsistent when one core updates data that another core has cached.

Two main coherence strategies:

Listener‑based

All caches monitor writes; either update other caches (write‑update) or invalidate their copies (write‑invalidate).

Directory‑based

The memory maintains a directory of which caches hold each block, supporting protocols such as SI, MSI, and MESI.

The MESI protocol defines four states (Modified, Shared, Exclusive, Invalid) and transitions based on reads and writes.

4. Summary

Cache plays a crucial role in computer architecture; this article covered its concepts, classifications, operation, replacement, write policies, and coherence mechanisms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cache CPU Memory Hierarchy cache coherence

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.