Fundamentals 10 min read

How False Sharing Slows Your Java Apps and How to Eliminate It

This article explains the cache architecture behind false sharing, demonstrates its dramatic performance impact with benchmark code, and shows how Java's @Contended annotation and Caffeine's memory‑padding techniques can effectively eliminate the issue for high‑concurrency applications.

JD Cloud Developers

Jul 1, 2025

In high‑concurrency multi‑core scenarios, false sharing is an invisible performance killer. When different threads frequently modify independent variables that reside on the same cache line, the CPU’s cache‑coherency protocol forces the whole line to be synchronized, causing an invalidation storm that dramatically slows overall throughput. This article explains the cache architecture, reproduces the false‑sharing problem with benchmark code (reducing execution time from 3709 ms to 473 ms), and analyses the underlying mechanism. It also examines how the high‑performance cache library Caffeine uses memory‑padding (120‑byte placeholder variables) and how the JDK 1.8 @Contended annotation solves false sharing by “trading space for time”.

False Sharing

False sharing occurs when multiple threads repeatedly write to different variables that happen to share the same cache line. Although the variables are logically independent, the cache‑coherency protocol (e.g., MESI) forces the entire line to be invalidated, leading to frequent memory accesses and performance degradation. The issue can be mitigated by padding or isolating fields so each occupies its own cache line.

CPU cache typically consists of three levels (L1, L2, L3). A cache line is usually 64 bytes (or 128 bytes) and is the basic unit of data transfer between memory and the CPU. When a long[] array element is loaded, seven neighboring elements are also brought into the same line. If two cores modify different values (X and Y) within the same line, each modification invalidates the other core’s copy, causing slowdown.

public class TestFalseSharing {
    static class Pointer {
        // two volatile variables to ensure visibility
        volatile long x;
        volatile long y;
        @Override
        public String toString() {
            return "x=" + x + ", y=" + y;
        }
    }

    @Test
    public void testFalseSharing() throws InterruptedException {
        Pointer pointer = new Pointer();
        long start = System.currentTimeMillis();
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < 100_000_000; i++) {
                pointer.x++;
            }
        });
        Thread t2 = new Thread(() -> {
            for (int i = 0; i < 100_000_000; i++) {
                pointer.y++;
            }
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println(System.currentTimeMillis() - start);
        System.out.println(pointer);
    }
}

Running this code yields about 3709 ms because x and y share a cache line. Adding seven long padding fields between them separates the variables into different lines, reducing execution time to 473 ms.

public class TestFalseSharing {
    static class Pointer {
        volatile long x;
        long p1, p2, p3, p4, p5, p6, p7;
        volatile long y;
        @Override
        public String toString() {
            return "x=" + x + ", y=" + y;
        }
    }
    // test method omitted for brevity
}

In Caffeine, the WriterBuffer hierarchy uses 120‑byte padding classes (e.g., BaseMpscLinkedArrayQueuePad1) to ensure fields are allocated on separate cache lines, thereby avoiding false sharing.

The @Contended annotation introduced in JDK 1.8 works similarly: it pads annotated fields or classes so that they are placed on distinct cache lines. For the annotation to take effect, the JVM option -XX:-RestrictContended must be enabled.

public class ConcurrentHashMap<K,V> extends AbstractMap<K,V> implements ConcurrentMap<K,V>, Serializable {
    @sun.misc.Contended
    static final class CounterCell {
        volatile long value;
        CounterCell(long x) { value = x; }
    }
}

Using @Contended on the two volatile fields in the earlier benchmark yields a runtime of about 520 ms, comparable to the manual padding approach.

More About False Sharing

Avoiding false sharing mainly requires code inspection because the problem only appears when distinct global variables happen to be adjacent in memory. Local variables or thread‑local storage are not sources of false sharing. The fundamental solution—trading space for time—should be applied judiciously to avoid excessive memory consumption.

Performance Tuning false sharing cache optimization

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.