Understanding Java volatile, Memory Semantics, and the lock Prefix
This article explains the two core properties of Java's volatile keyword—visibility and ordering—how they are implemented via lock prefixes and memory barriers, compares volatile with synchronized and CAS, and details the underlying CPU cache‑coherency mechanisms such as MESI, store buffers, and invalidate queues.
Introduction
volatile has two main characteristics: visibility and ordering.
Visibility is achieved using the lock prefix, which implements a sniffing mechanism; each processor checks whether its working‑memory copy of a variable matches the main memory and invalidates it if not.
Ordering is enforced by memory barriers that prevent instruction reordering and also force cache flushes to guarantee visibility.
volatile Features
Single reads and writes of a volatile variable are synchronized by the same implicit lock. For example:
public class VolatileFeaturesExample {
volatile long vl = 0L;
public void set(long l) { vl = l; }
public void getAndIncrement() { vl++; // compound volatile read/write not atomic }
public long get() { return vl; // single volatile read is atomic }
}The equivalent synchronized version is:
class VolatileFeaturesExample1 {
long vl = 0L;
public synchronized void set(long l) { vl = l; }
public synchronized long get() { return vl; }
public void getAndIncrement() {
long temp = get();
temp += 1L;
set(temp);
}
}Because the lock’s happens‑before rule guarantees memory visibility between the releasing and acquiring threads, a volatile read always sees the latest write, and the lock also provides atomicity for the critical section.
From this we derive the following properties of volatile variables:
Visibility – a volatile read always observes the most recent write.
Atomicity – single reads/writes are atomic, but compound operations like volatile++ are not.
happens‑before Relationship of volatile Write/Read
From a memory perspective, a volatile write‑read has the same effect as a lock release‑acquire pair.
public class VolatileExample {
int a = 0;
volatile boolean flag = false;
public void writer() {
a = 1; // 1
flag = true; // 2
}
public void reader() {
if (flag) { // 3
int i = a; // 4
}
}
}Program order: 1 happens‑before 2, 3 happens‑before 4.
volatile rule: 2 happens‑before 3.
Transitivity: 1 happens‑before 4.
Implementation of volatile Memory Semantics
JMM restricts both compiler and processor reordering to preserve volatile semantics. It inserts memory barriers of four types:
StoreStore before each volatile write.
StoreLoad before each volatile write.
LoadLoad before each volatile read.
LoadStore before each volatile read.
volatile Write
The StoreStore barrier ensures that all ordinary writes before the volatile write become visible to all processors. The following StoreLoad barrier prevents reordering of the volatile write with any subsequent volatile read/write.
volatile Read
LoadLoad and LoadStore barriers guarantee that no preceding ordinary reads are reordered after the volatile read and that the read sees the most recent value from main memory.
public class VolatileBarrierExample {
int a;
volatile int v1 = 1;
volatile int v2 = 2;
void readAndWrite() {
int i = v1; // first volatile read
int j = v2; // second volatile read
a = i + j; // ordinary write
v1 = i + 1; // first volatile write
v2 = j * 2; // second volatile write
}
}The JVM inserts the necessary barriers during bytecode generation; the final StoreLoad barrier cannot be omitted because the compiler cannot know whether a following volatile read/write will occur.
volatile vs. CAS and the lock Prefix
Both volatile and CAS use the CPU lock prefix, but they differ:
volatile does not guarantee atomicity; it uses a lock‑prefixed instruction (e.g., lock addl $0, (%rsp) ) solely as a memory barrier.
CAS guarantees atomicity by using lock cmpxchg , which performs an atomic read‑modify‑write.
Specific Role of the lock Prefix
Early CPUs (Pentium) locked the bus, causing high overhead. Modern CPUs (P6 and later) use cache locking, which is much cheaper.
Cache coherence is maintained by the MESI protocol (Modified, Exclusive, Shared, Invalid). When a core writes to a cache line, other cores receive invalidation messages via the ring bus, ensuring consistency.
To handle cases where writes reside in store buffers or invalidations are pending, CPUs employ read/write barriers. A read barrier flushes pending invalidations; a write barrier forces store‑buffered writes to be committed to cache/main memory.
Thus, a lock instruction typically includes a read barrier before the critical section and a write barrier after it, guaranteeing that all cores see a consistent view of memory.
Reference: "Deep Understanding of the Java Memory Model".
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.