Fundamentals 21 min read

Why CPU Cache Coherence Matters: From Volatile to MESI and Memory Barriers

This article explores the fundamentals of CPU cache hierarchy, why caches are needed, how cache inconsistency arises in multicore systems, and the mechanisms—such as cache coherence protocols, MESI, store buffers, invalidate queues, and memory barriers—that ensure correct data ordering and visibility across processors.

Xiao Lou's Tech Notes

Jun 22, 2022

Why CPU Cache Coherence Matters: From Volatile to MESI and Memory Barriers

Introduction

A reader disagreed with a previous article, leading to a discussion that prompted this deep dive into cache coherence and related concepts.

Reader's Viewpoint

The reader claimed that the volatile keyword is implemented using the lock instruction, that the lock instruction triggers the cache coherence protocol, and that the Java Memory Model (JMM) relies on the cache coherence protocol.

Why Cache Is Needed

CPU executes instructions quickly, but memory access is slow, creating a performance bottleneck. To bridge this gap, multiple levels of cache are placed between CPU and memory, reducing access latency and improving CPU utilization.

Why Cache Can Be Inconsistent

In single‑core systems, a private cache does not cause inconsistency. In multicore systems, each core has its own private caches, leading to potential data divergence when multiple cores modify the same memory location.

Cache Coherence Protocol

To present multiple caches as a single coherent view, a cache coherence protocol maintains consistency of cache lines across cores.

Directory‑Based Protocol

A directory tracks the state of each cache line. When a core wants to access a line, it consults the directory to determine ownership and ensure consistency. Various directory formats exist (full bit vector, coarse bit vector, sparse, etc.).

Bus Snooping

In bus‑snooping, all cores monitor the shared bus for memory operations. When a core writes to a line, it broadcasts an invalidate message, causing other cores to invalidate or update their copies. This approach has low latency but higher bus traffic.

MESI Protocol

MESI (Modified, Exclusive, Shared, Invalid) is a widely used bus‑snooping protocol that adds two bits to each cache line to track its state, enabling write‑back caches and efficient coherence.

M (Modified) : line is dirty and not in memory.

E (Exclusive) : line is clean and owned by a single cache.

S (Shared) : line may be present in multiple caches and is clean.

I (Invalid) : line is not valid.

MESI messages include Read, Read Response, Invalidate, Invalidate Acknowledge, Read Invalidate, and Writeback.

Store Buffer

A store buffer sits between the CPU and its cache, allowing writes to be queued without waiting for coherence messages. Reads can forward data from the store buffer (store forwarding), improving performance but potentially breaking global ordering.

// CPU0
void foo() {
    a = 1;
    b = 1;
}

// CPU1
void bar() {
    while (b == 0) continue;
    assert(a == 1);
}

Invalidate Queue

An invalidate queue speeds up processing of invalidate messages by acknowledging them immediately and deferring actual invalidation, which also introduces ordering challenges similar to the store buffer.

Memory Barriers and Lock Instruction

Memory barriers enforce ordering: read barriers (lfence), write barriers (sfence), and full barriers (mfence) on x86. The lock prefix locks a cache line, providing barrier‑like effects, though its primary purpose is atomicity.

Recap of Questions

The reader’s claim that lock triggers the cache coherence protocol is inaccurate; lock provides atomicity and can act as a barrier but does not “trigger” coherence. Likewise, JMM is an abstract model for Java developers and is not directly tied to hardware cache coherence protocols.

Conclusion

Cache hierarchies, coherence protocols (directory vs. bus snooping), MESI, store buffers, invalidate queues, and memory barriers together ensure correct and efficient operation of multicore CPUs. Understanding these mechanisms is essential for low‑level performance tuning and for correctly reasoning about concurrent programs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Concurrency CPU Cache Coherence MESI Memory Barriers

Written by

Xiao Lou's Tech Notes

Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.