Fundamentals 19 min read

Understanding CPU Caches, Coherency Protocols, and Memory Models

This article provides a concise introduction to CPU cache architecture, explains read/write policies, describes cache coherency protocols such as MESI and its variants, and discusses how different memory models affect multi‑core consistency and performance.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Understanding CPU Caches, Coherency Protocols, and Memory Models

True programmers need to understand how computers work, and cache behavior is essential for writing efficient, correct code.

This series consists of two parts; the first was written earlier, and this article covers the foundational knowledge needed to grasp CPU caches.

Cache

This piece is a quick introduction to CPU caches. It assumes basic concepts are known but clarifies many details that are often overlooked.

On modern CPUs, virtually all memory accesses go through multiple levels of cache. Exceptions such as memory‑mapped I/O or write‑combined memory are rare and ignored here.

CPU read/write units cannot directly access memory; they communicate with the L1 cache, which in turn talks to L2, and possibly L3, before reaching main memory.

Caches are divided into lines (segments) of 32, 64, or 128 bytes, aligned to the cache line size. A cache line represents a block of memory that may be present in any cache level.

When a load instruction is issued, the address is sent to the L1 data cache (L1D$). If the line is not present, the entire line is fetched from the next level or main memory, based on the locality assumption that nearby addresses will be accessed soon.

For reads, all cache levels obey the basic law: at any moment, the contents of a cache line equal the contents of the corresponding memory location.

When writes are allowed, two primary policies exist: write‑through and write‑back. Write‑through immediately propagates the write to the next level (or memory) and keeps the cache line coherent. Write‑back marks the line as dirty and postpones the write until the line is evicted, following the write‑back law.

Write‑back can improve performance by coalescing multiple writes to the same line, while write‑through is simpler.

Cache sizes can differ between levels; for example, L1 may be 32 bytes while L2 is 128 bytes.

Coherency Protocols

With a single core, caches work fine. With multiple cores, each core has its own caches, leading to potential inconsistencies when one core modifies data that another core has cached.

To keep caches synchronized, systems use coherency protocols. The most common in everyday computers is the snooping protocol, where all caches monitor the shared bus for transactions.

In snooping, when a core writes, other cores are notified and invalidate or update their copies. Write‑through makes this straightforward; write‑back requires additional handling, which is addressed by the MESI protocol.

MESI and Derived Protocols

MESI defines four states for a cache line: Invalid (I), Shared (S), Exclusive (E), and Modified (M). I means the line is not present or stale. S allows read‑only sharing. E grants exclusive ownership without being dirty. M indicates a dirty line that must be written back before being shared.

Transitions between these states ensure that a core obtains exclusive ownership before writing, and that modified lines are written back before other cores can read them.

Derived protocols add states such as Owned (O) in MOESI, or designate a responder for shared lines in MERSI/MESIF, reducing bus traffic while preserving the core invariants.

Memory Model

Different architectures provide different memory models. ARM and POWER have relatively weak models that allow extensive reordering, requiring memory barriers to enforce ordering. x86 has a stronger model that provides more guarantees.

Weak models permit reads of stale data and delayed write visibility due to store buffers and invalidation queues. Stronger models track pending operations (e.g., x86’s MOB) and can roll back speculative execution to maintain consistency.

Understanding these models is crucial for writing correct concurrent code, especially when combined with cache coherency mechanisms like MESI.

In summary, CPU caches, coherency protocols, and memory models together define how multi‑core systems maintain a consistent view of memory while optimizing performance.

Memory ModelCPU cacheHardware ArchitectureMESI protocolCache Coherency
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.