Understanding CPU Caches, Coherency, and Memory Models: A Quick Guide
This article provides a concise introduction to CPU cache hierarchies, read/write policies, cache coherency protocols such as snooping and MESI, and the impact of different memory models on multi‑core systems, helping developers grasp essential hardware concepts for reliable software design.
This article offers a fast‑track overview of CPU caches, assuming the reader knows basic concepts but may be unfamiliar with details.
Modern CPUs access memory through multiple cache levels (L1, L2, possibly L3). Most memory accesses go through these caches, except rare cases like memory‑mapped I/O or write‑combined memory, which are ignored here.
Cache lines (segments) are fixed‑size blocks (32, 64, or 128 bytes) aligned to the cache size. When a CPU reads, the address is sent to the L1 data cache, which checks for the line; if missing, the entire line is fetched from the next level or main memory, based on the locality principle.
Basic Law : At any moment, the contents of a cache line in any cache level equal the contents of the corresponding memory region.
Write operations introduce two policies:
Write‑through : Data is written directly to the next cache level (or memory) and the cache line is updated or discarded, keeping the cache‑memory invariant.
Write‑back : Data is written only to the current cache line, which is marked “dirty”. The line must be written back to the lower level before it is evicted, establishing a weaker invariant that dirty lines will eventually be flushed.
Write‑back Law : After all dirty lines have been written back, the contents of any cache line at any level match the corresponding memory contents.
Write‑back improves performance by coalescing multiple writes to the same address, while write‑through is simpler.
The article then skips advanced topics such as associativity, write‑allocate vs. non‑allocate, unaligned accesses, and virtual‑address caches.
Cache Coherency Protocols
With a single core, caches work fine. With multiple cores, each core has its own caches, raising the problem of keeping them consistent when one core modifies a memory location.
One naïve solution is a single shared L1 cache, but this creates a severe bottleneck. Instead, systems use coherency protocols that make multiple caches appear as a single coherent view.
Most mainstream systems employ snooping protocols, where all caches monitor a shared bus. When a cache writes, other caches are notified and invalidate or update their copies.
In write‑through mode, updates are broadcast immediately. In write‑back mode, the cache must inform others before modifying its line, leading to the MESI protocol (Modified, Exclusive, Shared, Invalid).
MESI and Derived Protocols
MESI defines four states for a cache line:
Invalid (I) : The line is absent or stale.
Shared (S) : Clean copy, readable by multiple caches.
Exclusive (E) : Clean copy owned by a single cache; other caches have no copy.
Modified (M) : Dirty copy; other caches must invalidate.
To write, a core must obtain exclusive ownership (E or M). If another core holds the line, it is invalidated. When a modified line is evicted, it is written back to memory.
Extensions such as MOESI (adding Owned), MERSI, or MESIF introduce additional states to reduce traffic, but the core invariants remain unchanged.
MESI Law : After all Modified lines are written back, every cache line at any level matches memory, and when a line is in Exclusive state, no other cache holds that line.
Thus MESI (and its derivatives) provide sequential consistency, the strongest guarantee in the C++11 memory model.
Memory Models
Different architectures expose different memory models. ARM and POWER have relatively weak models that allow extensive reordering, requiring explicit memory barriers to enforce ordering. x86 offers a stronger model with built‑in ordering mechanisms.
Even with strong models, modern CPUs employ out‑of‑order execution, store buffers, and invalidation queues, meaning reads may see stale data and writes may complete later than program order suggests.
Consequently, developers must understand both cache coherency (e.g., MESI) and the underlying memory model to write correct concurrent code, especially on weakly ordered architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
