Fundamentals 16 min read

Why CPU Memory Access Is Far More Complex Than You Think

The article explains how CPUs read and write memory through a hierarchy of caches, virtual memory translation, and coherence protocols, revealing that the seemingly simple operation actually involves multiple hardware and software layers that programmers must understand to write high‑performance code.

Liangxu Linux

Sep 17, 2024

Why CPU Memory Access Is Far More Complex Than You Think

Instruction Generation for Memory Access

Compilers translate high‑level source code into machine instructions. In RISC ISAs the compiler emits explicit load and store instructions that move data between registers and memory. In CISC ISAs such as x86 a single instruction may contain a memory operand, so the same hardware fetches data directly from memory.

Two Kinds of Memory Operations

Both RISC load/store and CISC memory‑operand instructions ultimately read or write bytes stored in main RAM. In addition the CPU fetches the next instruction stream from memory, following the von Neumann model where program code and data share the same address space.

CPU‑Memory Speed Gap

The processor clock runs at several GHz while DRAM latency is on the order of 100 ns, a difference of two orders of magnitude. Directly waiting for DRAM on every access would stall the pipeline, so intermediate storage is required.

Principle of Locality and Caching

Programs exhibit temporal locality (recently accessed data is likely to be reused) and spatial locality (nearby addresses are likely to be accessed). Caches—small SRAM structures placed between the CPU core and DRAM—store a subset of the most frequently used lines, reducing average access latency from hundreds of nanoseconds to a few cycles.

Cache Write Policies

Write‑through updates both the cache line and the backing DRAM on every store, guaranteeing coherence at the cost of higher write latency. Write‑back (asynchronous) marks the line dirty and postpones the write to DRAM until the line is evicted, improving performance but requiring a coherence protocol to keep other copies consistent.

Multilevel Cache Hierarchy

Modern CPUs implement at least three levels:

L1: smallest (≈32 KB per core), fastest (≈4 cycles), split into instruction and data caches.

L2: larger (≈256 KB–1 MB), slightly slower (≈10–12 cycles), usually private to a core.

L3: shared among all cores, several megabytes, latency ≈30 cycles.

On a miss the lookup proceeds from L1 → L2 → L3 → main memory.

Cache Coherence in Multicore Systems

When multiple cores hold private copies of the same cache line, an update by one core must be propagated to the others; otherwise stale data leads to incorrect results. Protocols such as MESI (Modified, Exclusive, Shared, Invalid) track the state of each line and issue coherence traffic on reads, writes, and evictions.

Virtual Memory and Address Translation

Each process is given a contiguous virtual address space. The Memory Management Unit (MMU) translates a virtual page number to a physical frame using page tables (often cached in a Translation Lookaside Buffer, TLB). The operating system also maintains a page cache that can back virtual pages with data stored on secondary storage; a page fault triggers a disk read.

Key Takeaways

The CPU‑memory subsystem consists of a hierarchy of caches, coherence protocols, and virtual‑memory translation. Efficient software must exploit temporal and spatial locality, minimize cache‑line sharing across cores, and be aware of write‑back vs. write‑through effects to achieve high performance on modern multicore processors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cache Virtual Memory Memory Hierarchy multicore Coherence

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.