Fundamentals 20 min read

Understanding CPU Cache: History, Principles, and Design Strategies

This article explains the evolution of CPU cache, its underlying principles of temporal and spatial locality, various cache architectures, implementation details, and practical considerations such as cache line size and replacement policies, providing a comprehensive overview for developers and computer engineers.

Open Source Linux

Jul 30, 2024

Understanding CPU Cache: History, Principles, and Design Strategies

CPU Development History and the Need for Cache

In 1978 the first PC microprocessor (8088) ran at 4.77 MHz, with CPU access times far slower than memory, so cache was unnecessary. Starting with the 80386 (40 MHz) the CPU outpaced memory, leading to the adoption of cache as a fast, small, low‑latency storage tier between CPU and main memory.

Early CPUs lacked internal L1 cache due to cost; later models (486) introduced 8 KB L1 cache and optional external L2 cache. The Pentium era added separate 8 KB data and instruction caches and external L2, while the Pentium Pro integrated L2 cache, establishing the modern cache hierarchy.

What Is CPU Cache?

Modern CPUs typically have three cache levels (L1, L2, L3). L1 is split into data and instruction caches and is private to each core; L2 is also core‑private; L3 is shared among cores. If data is not found in any cache level, it is fetched from main memory (or ultimately from storage).

Cache Principles

Temporal Locality : Recently accessed data is likely to be accessed again soon, so it is kept in cache.

Spatial Locality : Data near a recently accessed address is likely to be accessed, prompting prefetch of neighboring blocks.

These principles reduce main‑memory accesses, lower latency, and improve overall processor performance.

Cache Implementation

CPU caches are built with SRAM because of its speed and non‑destructive reads, unlike DRAM which requires periodic refreshes and has higher latency.

Why Not Use Registers as Cache?

Registers are far fewer, more expensive, and private to each core, whereas caches provide larger, shared storage with manageable cost and complexity, making them suitable for bridging the CPU‑memory speed gap.

Cache Internals

Data is transferred between memory and cache in units called cache lines (e.g., 64 bytes). The cache line is the smallest unit of storage and transfer.

Cache Mapping Types

Direct‑Mapped Cache : Each memory block maps to exactly one cache line, which can cause conflict misses (cache thrashing).

Set‑Associative Cache : An index points to a set of multiple lines (e.g., two‑way), allowing any line in the set to hold the data, reducing thrashing.

Fully Associative Cache : Any memory block can be placed in any cache line, eliminating index conflicts but incurring higher hardware cost.

Direct‑Mapped Example

With eight cache lines, addresses 0x00, 0x40, and 0x80 share the same index and map to the same line, leading to repeated misses if accessed sequentially.

Set‑Associative Example

In a two‑way set‑associative cache, the same index can refer to two lines; if one line holds the needed tag, the access hits, mitigating thrashing.

Fully Associative Example

All lines belong to a single set; the processor searches all tags until a match is found, avoiding thrashing at the expense of more complex hardware.

Cache Line State and Size

A cache line is considered idle when it holds no valid data or tag. Cache line sizes are fixed at design time, commonly 64 bytes or 128 bytes, chosen to balance spatial locality benefits against cost and complexity.

How to Determine Cache Line Size on a Server

Consult the processor’s specification sheet.

Use tools such as CPU‑Z.

Run OS‑specific commands (e.g., lscpu on Linux).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cache CPU Memory Hierarchy

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.