Fundamentals 9 min read

How False Sharing Slows Java Programs and How to Eliminate It

This article explains the concept of false sharing in CPU caches, demonstrates its performance impact with Java code, analyzes the results, and shows how to prevent it using the @Contended annotation and appropriate JVM flags.

Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
How False Sharing Slows Java Programs and How to Eliminate It

CPU Cache and False Sharing

CPU caches consist of L1‑L3 levels, each organized into cache lines (default size 128 bits). When a core writes to a cache line, the coherence protocol invalidates the same line in other cores, forcing them to fetch fresh data from main memory.

Definition and Causes of False Sharing

False sharing occurs when different threads write to variables that reside on the same cache line but are not logically related. Because the line must be invalidated on each write, other threads repeatedly miss the cache, dramatically reducing performance.

Code Demonstration

<code>public class FalseShared {
    private static final int[][] arr2continuous = new int[1024][1024];
    private static final int[][] arr2notcontinuos = new int[1024][1024];
    static {
        for (int i = 0; i < 1024; i++) {
            arr2continuous[0][i] = i * 2 + 1;
            arr2notcontinuos[i][0] = i * 2 + 1;
        }
    }
    private static void readByContinuous() {
        long start = System.currentTimeMillis();
        for (int row = 0; row < arr2continuous.length; row++) {
            for (int col = 0; col < arr2continuous.length; col++) {
                long temp = arr2continuous[row][col];
            }
        }
        long end = System.currentTimeMillis();
        System.out.println("read arr by continuous with time : " + (end - start));
    }
    private static void readByNotContinuous() {
        long start = System.currentTimeMillis();
        for (int row = 0; row < arr2notcontinuos.length; row++) {
            for (int col = 0; col < arr2notcontinuos.length; col++) {
                long temp = arr2notcontinuos[col][row];
            }
        }
        long end = System.currentTimeMillis();
        System.out.println("read arr by not continuous with time : " + (end - start));
    }
    public static void main(String[] args) throws Exception {
        new Thread(() -> readByContinuous()).start();
        new Thread(() -> readByNotContinuous()).start();
    }
}
</code>

Running the program repeatedly shows the continuous‑array version completing roughly twice as fast as the non‑continuous version, confirming that data layout affects cache efficiency.

Result Analysis

The array stored row‑wise (continuous) benefits from cache line prefetching.

The column‑wise (non‑continuous) layout forces the CPU to load a new cache line for each element, causing more misses.

Storing data contiguously improves execution speed by reducing cache‑miss penalties.

Non‑contiguous data ends up in different cache lines, leading to false sharing when multiple threads access them.

Java Solutions to False Sharing

JVM provides the @sun.misc.Contended annotation to separate fields into distinct cache lines. The annotation can be applied to individual fields or whole classes, and it requires the VM flag -XX:-RestrictContended to be enabled. The cache‑line width can be adjusted with -XX:ContendedPaddingWidth=256 .

<code>// Example of using @Contended on a field
public class ContendedTest1 {
    @Contended
    private Object contendedField1;
    private Object plainField1;
    private Object plainField2;
    private Object plainField3;
    private Object plainField4;
}
// JVM output shows contendedField1 placed far from the other fields (offset 156 vs 12‑24).
</code>

Applying the annotation forces the JVM to pad the contended field so that it resides on a separate cache line, eliminating false sharing.

Usage Guidelines

Add @Contended to fields or classes that are frequently written by different threads.

Run the application with -XX:-RestrictContended (and optionally -XX:ContendedPaddingWidth=256 ) to activate the padding.

Consider separating hot and cold data based on access patterns to maximize cache utilization.

Java performanceMultithreadingCPU cachefalse sharingcache lineContended annotation
Xiaokun's Architecture Exploration Notes
Written by

Xiaokun's Architecture Exploration Notes

10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.