Why Does Java’s volatile Keyword Work? Deep Dive into CPU Caches and Memory Barriers
This article explains the hardware origins of CPU caches, bus locks, cache‑coherence protocols such as MESI, store buffers and memory barriers, then shows how the Java Memory Model abstracts these mechanisms and how the volatile keyword guarantees visibility and ordering while not providing atomicity.
CPU Cache Origins
Early CPUs accessed memory directly, but the huge speed gap between CPU cycles and memory latency forced designers to introduce a small, fast storage layer called cache memory between the CPU and main memory. Modern CPUs typically have three cache levels (L1, L2, L3), each serving different scopes of data and instruction storage.
Cache lines are the smallest unit of data that a cache stores. When a core reads or writes, it first interacts with its private cache; later the cache synchronises with main memory.
Cache‑Coherency and Bus Locks
In multi‑core systems each core may hold its own copy of a memory line, leading to consistency problems. CPUs solve this with two mechanisms:
Bus locks: a core can broadcast a LOCK# signal, temporarily preventing other cores from accessing the targeted cache line.
Cache‑coherency protocols (e.g., MESI) that automatically invalidate or update copies when one core writes.
MESI Protocol
The MESI protocol defines four states for a cache line:
M (Modified) : only this core has the line, and it differs from memory.
E (Exclusive) : only this core has the line, identical to memory.
S (Shared) : multiple cores hold identical copies.
I (Invalid) : the line is not valid in this core.
State transitions ensure that a write in one core makes other cores’ copies invalid, forcing them to reload from memory.
Store Buffers and Memory Barriers
To avoid stalling the pipeline, CPUs write to a Store Buffer before the data reaches memory. While the buffer holds the write, the core can continue executing other instructions. Other cores receive an invalidate message, but the writing core may still read the stale value until the buffer flushes.
Store buffers introduce two new phenomena:
Store Forwarding : a core checks its own store buffer for a pending write before reading from its cache.
Invalidation Queues : asynchronous queues that hold invalidate messages when the receiving core cannot process them immediately.
Memory Barriers in Hardware
Hardware provides three barrier types:
Store‑Store : ensures all previous stores are globally visible before any later store.
Load‑Load : guarantees earlier loads complete before later loads.
Load‑Store : forces earlier loads to finish before later stores.
Store‑Load : the strongest barrier, preventing any reordering across it (but costly).
Java Memory Model (JMM)
The JMM abstracts these hardware details, defining a main memory and per‑thread working memory . All variables reside in main memory; each thread works on a local copy and must explicitly read from or write back to main memory.
JMM guarantees visibility and ordering by inserting appropriate memory‑barrier instructions during compilation. For volatile reads and writes, the JVM adds:
Before a volatile write: a Store‑Store barrier.
After a volatile write: a Store‑Load barrier.
After a volatile read: both Load‑Load and Load‑Store barriers.
Volatile Example
public class VolatileTest {
private boolean initFlag = false;
public static void main(String[] args) throws InterruptedException {
VolatileTest sample = new VolatileTest();
Thread threadA = new Thread(sample::refresh, "threadA");
Thread threadB = new Thread(sample::load, "threadB");
threadB.start();
Thread.sleep(2000);
threadA.start();
}
public void refresh() {
this.initFlag = true;
System.out.println("thread: " + Thread.currentThread().getName() + ": modify initFlag");
}
public void load() {
while (!initFlag) {}
System.out.println("thread: " + Thread.currentThread().getName() + " observed initFlag change");
}
}Without volatile, thread B may loop forever because it reads a stale copy from its working memory. Declaring initFlag as volatile forces the write to be flushed to main memory and the read to be re‑loaded, allowing both messages to appear.
Volatile Does Not Provide Atomicity
Operations like count++ consist of a read, an increment, and a write. Even with volatile, two threads can interleave these steps, leading to lost updates. Therefore, volatile should only be used when writes do not depend on the current value and the variable is not part of a larger invariant.
Typical Use‑Cases for volatile
State Flags
volatile boolean shutdownRequested;
public void shutdown() { shutdownRequested = true; }
public void doWork() { while (!shutdownRequested) { /* work */ } }Safe Publication (e.g., Double‑Checked Locking)
private volatile static Singleton instance;
public static Singleton getInstance() {
if (instance == null) {
synchronized (Singleton.class) {
if (instance == null) {
instance = new Singleton();
}
}
}
return instance;
}Independent Observation
Threads periodically update a volatile field (e.g., latest sensor reading) so other threads can always read the freshest value without additional synchronization.
"volatile bean" Pattern
@ThreadSafe
public class Person {
private volatile String firstName;
private volatile String lastName;
private volatile int age;
public String getFirstName() { return firstName; }
public String getLastName() { return lastName; }
public int getAge() { return age; }
public void setFirstName(String f) { firstName = f; }
public void setLastName(String l) { lastName = l; }
public void setAge(int a) { age = a; }
}Low‑Cost Read‑Write Strategy
Combine a synchronized write path with a volatile read path to minimise contention when reads dominate.
@ThreadSafe
public class CheesyCounter {
private volatile int value;
public int getValue() { return value; }
public synchronized int increment() { return value++; }
}Conclusion
CPU caches and store buffers improve performance but create visibility and ordering challenges in multi‑threaded programs. Hardware memory barriers and cache‑coherency protocols address these issues, and the Java Memory Model exposes a uniform abstraction that the volatile keyword leverages to guarantee visibility and ordering, though not atomicity. Understanding these fundamentals helps developers choose the right synchronization primitive for each scenario.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Brother's Insights
A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
