Fundamentals 10 min read

Why CPUs Fight Even Without Shared Variables: Understanding False Sharing

The article explains that in multithreaded programs, even when threads operate on independent data, the CPU may still suffer severe performance loss due to false sharing of cache lines, and shows how cache‑line alignment and C++17 hardware‑aware constants can eliminate the problem.

IT Services Circle
IT Services Circle
IT Services Circle
Why CPUs Fight Even Without Shared Variables: Understanding False Sharing

In high‑performance multithreaded programming, developers often assume that if each thread works on its own data without locks or atomic operations, scalability will be linear. In reality, CPUs can still contend heavily because the hardware cache‑coherency protocol works at the granularity of cache lines, not individual variables.

Modern CPUs load data in fixed‑size blocks called cache lines (typically 64 bytes on Intel/AMD, 128 bytes on Apple M‑series). When a core writes to any byte within a cache line, the entire line is marked invalid in the caches of other cores, forcing them to fetch the updated line from memory. This phenomenon is known as false sharing and leads to cache thrashing and stalls.

The article illustrates false sharing with two independent variables A and B that happen to reside in the same 64‑byte cache line. Thread 1 on core 1 repeatedly updates A, causing core 2’s copy of the line (containing B) to become invalid, and vice‑versa. The resulting ping‑pong effect dramatically slows down parallel execution.

A concrete benchmark demonstrates the impact. The sequential (single‑thread) run takes about 2.9 seconds, while the parallel version that suffers false sharing takes roughly 5.0 seconds on the same hardware.

To break the contention, the article presents two solutions. The first uses explicit alignment:

#include <iostream>
#include <thread>
#include <chrono>
#include <atomic>

alignas(64) std::atomic<uint64_t> counter1{0};
alignas(64) std::atomic<uint64_t> counter2{0};

// thread_work, run_no_threads, and main omitted for brevity

This forces each counter onto a separate cache line, eliminating false sharing on Intel CPUs. However, hard‑coding 64 bytes is unsafe on platforms with different line sizes (e.g., Apple M‑series).

C++17 introduces hardware‑aware constants in <new>: std::hardware_destructive_interference_size – the minimum byte distance required to avoid false sharing. std::hardware_constructive_interference_size – the size that fits comfortably within a single cache line.

Using these, the article defines a cache‑line‑aligned structure:

#include <iostream>
#include <new>
#include <vector>
#include <thread>
#include <chrono>

struct alignas(std::hardware_destructive_interference_size) ThreadCounter {
    uint64_t count = 0; // payload (8 bytes)
};

struct NormalCounter { uint64_t count = 0; };

// run_benchmark template runs a parallel increment loop for each counter type.

Running benchmarks shows the normal (compact) structure taking ~5557 ms, while the cache‑line‑aligned version finishes in ~4148 ms, confirming the performance benefit.

The article concludes that, contrary to common belief, the dominant cost in modern multithreading is often cache‑coherency traffic rather than lock contention. Proper cache‑line alignment, either via alignas or the C++17 hardware constants, is essential for achieving expected scalability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceC++multithreadingfalse sharingcache linehardware_interference_size
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.