Fundamentals 13 min read

Why CPUs Need Cache: A Deep Dive into Cache Architecture and Consistency

This article explains the motivation behind CPU caches, describes their placement in the memory hierarchy, details cache organization, placement policies, lookup mechanisms, replacement strategies, write handling, and coherence protocols such as MESI, providing a comprehensive overview of cache fundamentals.

Architects' Tech Alliance

Nov 13, 2021

Why CPUs Need Cache: A Deep Dive into Cache Architecture and Consistency

1. Why Cache Is Needed

CPU performance has increased dramatically with advances in process technology, while DRAM memory speed has lagged, creating a bottleneck where storage limits computation. The solution leverages data locality: programs tend to access nearby memory locations repeatedly, allowing a small, fast storage layer to serve most accesses.

for (j = 0; j < 100; j = j + 1) {
    for (i = 0; i < 5000; i = i + 1) {
        x[i][j] = 2 * x[i][j];
    }
}

The nested loops illustrate spatial locality, as the accessed array elements reside close together in memory.

1.1 Actual System Cache Hierarchy

The typical memory subsystem consists of CPU registers, L1/L2/L3 caches, DRAM, and storage. Data is first sought in registers, then L1, L2, L3, DRAM, and finally on disk. Smaller caches provide faster access but hold less data.

1.2 Cache Classification

I‑Cache vs D‑Cache: I‑Cache stores instructions (read‑only), D‑Cache stores data (read/write).

Size: <4 KB caches are considered small (often L1), >4 KB are large (L2, L3).

Location: Inner caches belong to the CPU core (L1/L2), outer caches are shared (L3, last‑level cache).

Inclusion: Inclusive caches contain all data of lower‑level caches; exclusive caches do not.

2. How Cache Works

Four key questions must be answered:

How is data placed in the cache?

How is data looked up?

How is data replaced?

How are write operations handled?

2.1 Data Placement

Assume main memory has 32 blocks and the cache has 8 lines. To store block 12, three placement policies exist:

Fully associative – any line can hold the block.

Direct‑mapped – the block maps to a specific line (e.g., 12 mod 8).

Set‑associative – the block can reside in any line of a designated set (e.g., 2‑way set‑associative gives two possible lines).

Fully associative and direct‑mapped are extremes of set‑associative designs. Larger associativity improves hit rate but increases hardware cost.

2.2 Cache Lookup

Addresses are byte‑addressed, but cache transfers occur in blocks (typically 64 bytes). An address is divided into tag, index (set identifier), and block offset. During a lookup, the index selects a set, and tags of the lines in that set are compared in parallel. A tag match indicates a cache hit; the corresponding block is then returned.

2.3 Replacement Policies

When a miss occurs and the set is full, one line must be evicted. Common policies are:

Random replacement.

Least Recently Used (LRU) – evict the line not accessed for the longest time.

First‑In‑First‑Out (FIFO) – evict the oldest line.

Choice depends on workload characteristics.

2.4 Write Handling

Writes can cause inconsistency between cache and main memory. Three strategies exist:

Write‑through: data is written to both cache and main memory simultaneously, increasing write latency.

Write‑back: data is written only to the cache; the modified line (dirty) is written back to memory when it is evicted.

Write‑buffer (write‑through + write‑back combination): writes are queued and flushed to memory in batches, reducing traffic.

3. Cache Coherence

In multi‑core systems, each core has its own cache. Without coherence, one core could read stale data after another core writes to memory, leading to errors. Coherence protocols ensure that all caches see a consistent view of memory.

3.1 Snooping‑Based Protocols

All caches monitor the bus for write operations. Two reactions are possible:

Write‑update: the writing cache broadcasts the new data to all other caches.

Write‑invalidate: the writing cache invalidates the corresponding line in other caches.

3.2 Directory‑Based Protocols

A directory stored in main memory tracks which caches hold each block. Common states include:

SI – Shared or Invalid.

MSI – Modified, Shared, Invalid.

MESI – Modified, Exclusive, Shared, Invalid (the most widely used).

MESI example:

If a line is in the Invalid state and a core reads it, the line becomes Exclusive.

If another core reads the same line, both transition to Shared.

A write by a core in Shared state upgrades the line to Modified.

When a Modified line is read by another core, it is written back to memory and the state changes to Shared.

4. Summary

Cache is a critical component of modern computer architecture, providing fast access to frequently used data by exploiting locality. Understanding cache placement, lookup, replacement, write policies, and coherence mechanisms is essential for designing efficient systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cache CPU Computer Architecture Memory Hierarchy cache coherence

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.