Understanding Java's volatile Keyword: CPU Cache, Memory Visibility, and the MESI Protocol
This article explains how the volatile keyword ensures visibility in Java multithreaded programs by examining CPU cache architecture, cache‑coherency mechanisms such as the MESI protocol, and the low‑level assembly effects of volatile writes on modern x86 processors.
Introduction
In Java multithreaded programming, the volatile keyword plays a crucial role as a lightweight alternative to synchronized , guaranteeing the visibility of shared variables across processors. Visibility means that when one thread modifies a shared variable, other threads can read the updated value.
When a field is declared volatile , the Java Memory Model ensures that all threads see a consistent value. Unlike synchronized , a volatile variable does not trigger thread context switches, offering lower execution cost and higher efficiency in suitable scenarios.
This article delves into how volatile achieves visibility from a hardware perspective, beginning with an overview of CPU caches.
CPU Cache
CPU caches exist to bridge the speed gap between fast CPU cores and slower main memory. Typical access times are:
Main memory access: dozens to hundreds of clock cycles.
L1 cache access: 1–2 clock cycles.
L2 cache access: tens of clock cycles.
Because of this disparity, CPUs rarely read/write directly from/to memory; instead they use a hierarchy of caches (L1, L2, L3) that store frequently accessed data close to the core.
Each cache level holds a subset of the data from the next level, and cache hit rates are roughly 80% for L1, leaving only about 20% of accesses to fall through to lower levels or main memory.
Problems Caused by CPU Caches
The data flow can be visualized as CPU → Cache → Main Memory. When a program runs on a multi‑core system, each core maintains its own cache, which can lead to inconsistencies:
Core 0 reads a byte and, due to locality, loads neighboring bytes into its cache.
Core 3 performs the same read, so both caches contain the same data.
Core 0 modifies the byte; the change is written back only to Core 0’s cache, not to main memory.
Core 3 later reads the same byte and sees the stale value because the update was never propagated.
To solve this, CPU manufacturers define cache‑coherency protocols.
Cache Coherency Protocols
Because each core’s L1 cache may hold different data, a protocol is needed to keep caches consistent. Two common approaches are Bus Locking and the MESI protocol.
4.1 Bus Lock
When a core wants to modify data in its cache, it can issue a lock signal on the bus. All other cores stop accessing the corresponding cache lines until the lock is released, after which they fetch the latest data from memory.
However, bus locking incurs performance penalties, prompting the use of more efficient protocols such as MESI.
4.2 MESI
The MESI protocol maintains a state flag for each cache line with four possible states:
M : Modified – the cache line differs from memory and must be written back before other cores can read it.
E : Exclusive – the cache line matches memory and no other core holds a copy.
S : Shared – multiple cores hold identical copies that match memory.
I : Invalid – the cache line is no longer usable.
Cache‑read rules:
If the state is I , the core reads from memory.
If the state is M or E and another core requests the data, the owning core writes back to memory and transitions to S .
Only when the state is M or E may a core modify the cache line, after which the state becomes M .
Following these rules improves overall CPU efficiency.
How volatile Guarantees Visibility at the Hardware Level
On x86 processors, examining the JIT‑generated assembly for a volatile write reveals an extra instruction with a lock prefix:
instance = new Singleton(); // instance is a volatile variableCompiled assembly (simplified):
0x01a3de1d: movb $0x0,0x1104800(%esi)
0x01a3de24: lock addl $0x0,(%esp)The second line, lock addl $0x0,(%esp) , indicates a locked operation that forces the processor to write back the cache line to main memory.
5.1 Write‑back of the Cache Line
The LOCK# signal ensures exclusive access to the affected memory region, causing the processor to flush the modified cache line to system memory. Modern CPUs typically lock the cache line rather than the entire bus, reducing overhead.
5.2 Invalidation of Other Cores' Caches
After the write‑back, other cores’ copies of the same cache line become invalid. Intel’s IA‑32 and Intel 64 architectures use the MESI protocol to propagate this invalidation, ensuring that subsequent reads fetch the updated value from memory.
Thus, a volatile write on a multi‑core system results in a cache‑line flush and invalidation, providing the “happens‑before” visibility guarantee required by the Java Memory Model.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.