Fundamentals 20 min read

Understanding Java Memory Model: Visibility, Atomicity, Ordering, and Low‑Level Mechanisms

The article explains the Java Memory Model and its core concepts of visibility, atomicity, and ordering, detailing how volatile, synchronized, and final keywords work, the impact of CPU cache and instruction reordering, and low‑level mechanisms such as lock, cmpxchg, and false sharing.

Java Captain
Java Captain
Java Captain
Understanding Java Memory Model: Visibility, Atomicity, Ordering, and Low‑Level Mechanisms

Java Memory Model (JMM) is the core low‑level specification for Java concurrency, describing how memory objects become visible between threads and the main memory, and providing the foundation for understanding keywords such as volatile , synchronized and final .

The essence of multithread interaction: data conversation in shared memory

Threads can interact in two ways:

Through shared memory : threads read and write shared variables (the approach used by Java).

Through message passing : threads exchange messages indirectly (e.g., Erlang's concurrency model).

Java chooses the shared‑memory model, meaning all thread‑visible variables reside in main memory while each thread keeps a copy in its CPU cache or registers. JMM provides three key keywords— volatile , synchronized , final —to address the three core problems of concurrent programming:

Visibility : whether a modification by one thread becomes immediately visible to others (handled by volatile / synchronized / final ).

Atomicity : whether an operation is indivisible (handled by synchronized or CAS).

Ordering : whether the execution order matches the program order (handled by volatile and memory barriers).

CPU cache mechanisms (data‑copy inconsistency) and instruction reordering (compiler/processor optimizations) are the two major technical challenges that break data visibility.

Reordering: Compiler and processor performance‑optimization double‑edged sword

Both the compiler and the processor perform performance optimizations; instruction reordering is a key technique that changes the execution order without altering the final program result.

1. Compiler Reordering

Stage : compile‑time (when javac generates bytecode).

Optimization : remove redundant instructions, fold constants, move loop‑invariant calculations out of loops.

Typical case : int a = 1; int b = 2; int c = a + b; // compiler may reorder to b=2 → a=1 → c=3 (result unchanged)

2. Processor Reordering

Stage : run‑time (CPU executes instructions).

Optimization : pipeline parallel execution, store buffer asynchronous write‑back.

Typical impact : If thread A writes x then y , the CPU may flush y first, then x , causing thread B to see an intermediate state where y is updated but x is not.

False Sharing: Cache‑line “collateral invalidation trap”

CPU caches store data in cache lines (usually 64 bytes). The cache‑coherency protocol (e.g., MESI) keeps caches and main memory synchronized, but an invalidation operation affects the whole cache line.

Problem scenario

Assume variables A (8 bytes) and B (8 bytes) reside in the same cache line. When thread 1 modifies A , the cache line is invalidated; thread 2 later accesses B must reload it from main memory even though B itself was not changed, leading to performance loss—this is false sharing.

Solution

Space for time : let each variable occupy its own cache line.

Java implementation : use @sun.misc.Contended (requires JVM flag -XX:-RestrictContended ) to add padding bytes around a field or class. @sun.misc.Contended class Counter { private volatile long value = 0; // automatically padded to 64 bytes, occupies a dedicated cache line }

Assembly instruction lock : hardware‑level synchronization cornerstone

The lock prefix is an Intel CPU instruction used to solve cache‑coherency problems in multi‑CPU environments; it underlies the implementation of volatile and CAS.

Core functions

Lock range evolution : Pre‑Pentium: locks the system bus, blocking all other CPUs (high cost). Modern cores: lock the target cache line (Cache Line Locking) via MESI, affecting only that line (efficient).

Instruction barrier : prevents the CPU from reordering operations before and after the lock instruction, ensuring ordering.

Forced data sync : flushes the current CPU cache line to main memory and invalidates other CPUs' copies, guaranteeing visibility.

Typical application

The lock instruction is the low‑level implementation of a volatile write; the JIT generates a lock addl $0, (%esp) that triggers the lock mechanism without performing any arithmetic.

Assembly instruction cmpxchg : the atomic compare‑magic of AQS

cmpxchg (Compare and Exchange) is a CPU atomic instruction; Java’s Unsafe.compareAndSwapXXX methods (e.g., compareAndSwapInt ) rely on it.

Instruction logic

cmpxchg target, expected_value, new_value
; if target == expected_value then set target = new_value else leave unchanged

In a multi‑CPU environment, cmpxchg is prefixed with lock to guarantee atomicity:

lock cmpxchg [address], eax  ; lock the cache line, ensure the compare‑exchange is indivisible

Role in AQS

The Java concurrency framework AbstractQueuedSynchronizer (AQS) uses cmpxchg to implement spin locks: acquiring a lock attempts to CAS the state variable; releasing the lock uses CAS to wake waiting threads, forming the basis of ReentrantLock , Semaphore , etc.

Atomic CAS operations: efficient yet risky lock‑free programming

CAS (Compare‑And‑Swap) is the core mechanism for lock‑free concurrency, implemented via CPU spin loops without OS intervention.

Core process

Read the current value (V).

Compare with the expected value (A).

If equal, swap in the new value (B); otherwise retry (spin).

Atomicity based on cache locking

After a write, the MESI protocol invalidates the same line in other CPUs' caches.

Subsequent reads must fetch the latest value from main memory, ensuring consistency.

Three core problems

ABA problem : Scenario: a variable changes A→B→A, CAS mistakenly thinks it is unchanged. Solution: add a version stamp, e.g., AtomicStampedReference (value + timestamp).

Single‑variable limitation : Only single variables can be atomically updated; for multiple fields wrap them in an object. Solution: use AtomicReference to hold the whole object.

Spin overhead : Heavy contention leads to long spinning and high CPU usage. Optimization: adaptive spinning (JVM adjusts spin count) or fallback to a lock.

Object header: JVM’s lock‑upgrade state register

In a 64‑bit JVM, an object header occupies 16 bytes (8 bytes MarkWord + 8 bytes class pointer). Arrays have an extra 4‑byte length field.

Core components

MarkWord (8 bytes) : Unlocked state: stores hash code, generation age, lock bits. Biased lock: thread ID, generation age, lock bits. Lightweight lock: pointer to lock record in thread stack. Heavyweight lock: pointer to OS mutex.

Class pointer : points to class metadata for type identification.

Array length (array‑only): records array size.

Lock state changes

Biased lock (no contention): first acquisition records thread ID; subsequent entries reuse it without real locking.

Lightweight lock (light contention): CAS spin to acquire, avoiding thread blocking.

Heavyweight lock (heavy contention): after spin timeout, upgrades to OS lock; threads are parked and later unparked, incurring higher cost.

volatile: lightweight visibility and ordering guarantee

volatile is a lightweight concurrency keyword that solves visibility and instruction‑reordering problems in multithreaded environments.

Two core guarantees

Compile‑time : inserts memory barriers. Write barrier ensures all prior instructions complete before the volatile write becomes visible. Read barrier ensures subsequent instructions execute after the volatile read. Prohibits reordering of volatile reads/writes with other instructions.

Run‑time (CPU) : On x86, JIT emits lock addl $0, (%esp) to enforce a cache‑line lock, flushing the line to main memory. Locks the target cache line, broadcasts invalidation. Other CPUs must reload the value from main memory.

Typical use cases

State flag, e.g., volatile boolean running = true; (thread‑safe stop flag).

Double‑checked locking singleton: public class Singleton { private static volatile Singleton instance; // ... }

Note: volatile does not guarantee atomicity for compound actions like i++ ; use AtomicInteger instead.

synchronized: full‑path lock upgrade from bias to heavyweight

synchronized is the built‑in JVM synchronization mechanism that dynamically upgrades the lock based on contention.

1. Biased lock (no contention)

Core logic : first entry records thread ID in MarkWord; subsequent entries compare the ID and skip real locking.

Upgrade trigger : another thread attempts the lock, CAS fails, bias upgrades to lightweight.

Visibility guarantee : exiting the synchronized block issues an implicit memory barrier, flushing data to main memory.

2. Lightweight lock (light contention)

Core logic : Thread creates a lock record on its stack, copying the object's MarkWord. CAS replaces MarkWord with a pointer to the lock record (acquire). If CAS fails, the thread spins (default 10 spins, JVM may adjust).

Advantage : avoids costly thread suspension, suitable for short‑lived contention.

3. Heavyweight lock (heavy contention)

Core logic : after spin timeout, the thread calls park() to block and enters the OS wait queue.

Release logic : unlocking calls unpark() to wake a waiting thread, causing a context switch.

Applicable scenario : high contention or long‑duration critical sections.

Lock optimization best practices

Reduce synchronized scope: lock only the necessary code block, e.g., synchronized(this) instead of synchronized(Class) .

Combine with explicit locks: use ReentrantLock.tryLock() to avoid permanent blocking.

final: constructor‑level reordering seal

The final keyword not only makes a variable immutable but also provides JMM guarantees that other threads will never see a partially initialized final field.

Two underlying mechanisms

Compiler constraints : Disallows moving final field initialization outside the constructor. Non‑final fields may be initialized outside the constructor (e.g., default values).

Processor constraints : Before the constructor finishes, the object reference ( this ) must not be exposed to other threads. Memory barriers ensure final field assignments complete within the constructor.

Anti‑pattern warning

public class FinalProblem {
    final int x;
    static FinalProblem instance;
    public FinalProblem() {
        x = 10;
        instance = this; // dangerous! other threads could see a half‑initialized object
    }
}

JMM guarantees that as long as the object is properly constructed, any thread reading the final field will see the fully assigned value, preventing the “half‑initialized object” issue.

Summary: JMM core technology map

JMM’s three core problems:

Visibility → volatile (memory barrier + lock ), synchronized (lock acquire/release semantics), final (constructor initialization).

Atomicity → synchronized (mutual exclusion), CAS ( cmpxchg + spin).

Ordering → volatile (prevents reordering), synchronized (ordered entry/exit).

Hardware mechanisms:

CPU cache → false sharing (mitigated by @Contended padding).

Instruction reordering → compiler reordering + processor reordering.

Assembly instructions → lock (cache‑line lock, ensures visibility & ordering), cmpxchg (CAS atomic operation).

Concurrency tool internals:

AQS → CAS ( cmpxchg ) + volatile state + doubly‑linked wait queue.

Atomic classes → Unsafe.compareAndSwapXXX built on cmpxchg .

Keyword

Visibility

Atomicity

Ordering

Lock mechanism

Applicable scenario

volatile

None

State flag, lightweight sync

synchronized

Lock upgrade

Critical section protection

final

None

Immutable variables

JavaconcurrencyCASvolatileMemory ModelJMMsynchronizedfinal
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.