Fundamentals 10 min read

Understanding CPU Cache False Sharing and How to Eliminate It

This article explains the concept of CPU cache false sharing, how it degrades performance on multi‑core systems, and provides practical techniques—including cache‑line alignment macros and padding strategies—to prevent it and improve multithreaded application efficiency.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Understanding CPU Cache False Sharing and How to Eliminate It

Main Content

How CPU Reads and Writes Data

Modern CPUs contain multiple cores, each with its own L1 and L2 caches, while L3 is shared among cores. The memory hierarchy includes RAM and disk, forming a pyramid where capacity increases and access speed decreases as you move down.

CPU accesses memory in blocks called cache lines (typically 64 bytes). When a core reads data, an entire cache line is loaded into its cache, so variables that reside in the same line are fetched together.

Because of this, accessing arrays sequentially yields high cache‑hit rates, while accessing unrelated variables that share a cache line can cause performance problems known as false sharing.

Analyzing the False Sharing Problem

Consider two threads running on a dual‑core CPU, each modifying a long variable (A and B) that are placed consecutively in memory and thus reside in the same cache line. When one core modifies its variable, the cache line must be invalidated in the other core’s cache, leading to a series of coherence transactions (MESI protocol) that repeatedly invalidate and transfer the line.

This ping‑pong of states—exclusive, shared, modified—causes the cache to lose its benefit, even though the variables are independent, because they share a cache line.

Methods to Avoid False Sharing

To prevent false sharing, ensure that frequently modified shared data do not occupy the same cache line. In Linux kernel code, the macro __cacheline_aligned_in_smp (or __cacheline_aligned on SMP systems) aligns variables to cache‑line boundaries.

If the macro expands to __cacheline_aligned, it aligns to the cache‑line size.

On single‑core systems the macro expands to nothing.

By aligning structures or padding them, variables are placed on separate cache lines, eliminating the false‑sharing effect.

In user‑space, the Disruptor library demonstrates a similar technique: a RingBufferPad class adds seven unused long fields before and after the actual data fields, providing front and rear padding so that the critical fields occupy their own cache lines. These padding fields are never read or written, ensuring that the cache line containing the hot data remains unmodified and thus avoids false sharing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationmultithreadingCPU cachefalse sharingcache line
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.