Fundamentals 14 min read

Why CPUs Can't Directly Read Memory – The Hidden Role of Cache and Coherence

This article explains how compilers generate load/store instructions, why modern CPUs rely on multi‑level caches instead of direct memory access, the impact of program locality, cache write policies, replacement strategies, and the challenges of cache coherence in multicore systems.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why CPUs Can't Directly Read Memory – The Hidden Role of Cache and Coherence

Who directs the CPU to read/write memory?

The compiler translates high‑level source code into machine instructions. Those instructions contain explicit memory addresses, and the CPU merely executes them.

RISC vs. CISC memory operations

In RISC architectures every instruction operates only on registers; therefore separate LOAD and STORE instructions move data between memory and registers. In CISC (e.g., x86) an instruction may embed a memory operand, so the CPU can fetch or store data as part of the same instruction.

Why caches are needed

Empirical studies (the “28 % law”) show that a small fraction of instructions dominate execution time. Program data also exhibits the principle of locality :

Temporal locality: recently accessed data is likely to be accessed again soon.

Spatial locality: data near a recently accessed address is likely to be accessed next.

Because main memory (DRAM) is orders of magnitude slower than the CPU, a fast intermediate storage—cache built from SRAM—is inserted between them.

Two fundamental memory accesses

Program‑generated reads/writes of data.

Instruction fetches (the CPU reads the next instruction from memory).

Both accesses are serviced first by the cache hierarchy.

Cache hierarchy

Modern CPUs provide multiple cache levels:

L1 – smallest, fastest, typically split into instruction and data caches.

L2 – larger, slightly slower, often unified.

L3 – shared among cores, even larger and slower.

The CPU probes L1 → L2 → L3; only when all levels miss does it access main memory.

Cache write policies

Write‑through: every store updates both the cache line and the backing memory immediately, guaranteeing consistency but incurring a write‑to‑memory latency.

Write‑back (asynchronous): a store updates only the cache line and marks it dirty. The dirty line is written back to memory later, typically when it is evicted, reducing write traffic at the cost of added complexity.

Cache replacement

Because a cache can hold only a limited number of lines, a replacement algorithm (e.g., LRU, FIFO, random) decides which line to evict when a new line is needed.

Multi‑core cache coherence

When each core has a private L1/L2 cache, the same memory location can exist in multiple caches simultaneously. If one core updates its copy, the others may hold stale data, leading to incorrect results.

Coherence protocols such as MESI (Modified, Exclusive, Shared, Invalid) ensure that all caches see a consistent view: M – line is dirty and exclusive to one cache. E – line is clean and exclusive. S – line may be present in multiple caches, all clean. I – line is invalid.

When a core wants to write a line that is currently S, it must first invalidate the copies in other cores, then transition the line to M.

Illustrative example of incoherence

Assume a variable X = 2 resides in main memory. Two cores, C1 and C2, each load X into their private caches (both see 2). C1 adds 2, updates its cache and writes back, so memory and C1’s cache now hold 4. C2, unaware of C1’s update, adds 4 to its stale copy ( 2 + 4 = 6) and writes back, leaving memory at 6. The expected result ( 2 + 2 + 4 = 8) is lost because the caches were not kept coherent.

Implications for software developers

To obtain high performance:

Write code that exhibits strong temporal and spatial locality so that the working set fits in the small, fast caches.

Avoid false sharing—situations where unrelated variables share the same cache line, causing unnecessary coherence traffic.

Prefer data structures and access patterns that align with cache line boundaries (typically 64 bytes).

Understanding the underlying memory hierarchy and coherence mechanisms helps developers reason about performance bottlenecks and write cache‑friendly parallel code.

References

For a deeper dive into cache replacement strategies, see Chapter 7 of Operating System Concepts (or any modern OS textbook).

For the MESI protocol and other coherence mechanisms, consult Chapter 5‑6 of Operating System Concepts or the original Intel documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CacheCPUMemory HierarchyRISCcache coherenceCISCmulticore
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.