Backend Development 26 min read

Unlocking High‑Performance C++ Concurrency: Memory Model, Atomics, and Lock‑Free Techniques

This article explains C++11’s memory model and atomic types, demonstrating how lock‑free concurrency, memory ordering, and synchronization primitives such as fences can be used to achieve high‑performance, race‑free multithreaded code for demanding backend systems like game servers.

Alibaba Cloud Developer

Dec 23, 2024

Unlocking High‑Performance C++ Concurrency: Memory Model, Atomics, and Lock‑Free Techniques

Introduction

In game backend development C++ remains dominant, and performance demands such as 120 Hz gameplay require aggressive optimization; lock‑free concurrency is essential, which relies on understanding the C++11 memory model and atomic types.

1. Memory Model Basics

Objects in C++ are regions of storage; each variable occupies at least one memory address, basic variables share a single address, and adjacent bit‑fields have contiguous addresses. Example class

class zoo { public: int m_number; Pig m_onePig; PigHome* p_pigHome; };

illustrates these points.

Concurrency depends on memory addresses; race conditions arise when multiple threads access the same address without ordering. Mutexes or atomic operations enforce ordering; without it, results are undefined.

Modification order: each object defines a total order of writes from all threads. Non‑atomic writes require synchronization; atomic operations let the compiler ensure ordering.

Compilers may reorder instructions for efficiency, and multi‑core CPUs have caches; developers must use appropriate memory ordering to guarantee visibility.

2. Standard Library Atomic Types

Atomic operations are indivisible; reads see either the initial or the updated value. C++ provides atomic specializations for fundamental types. Key points:

Atomic types are non‑copyable and non‑assignable.

Many are lock‑free; is_lock_free() can test this.

Compare‑and‑exchange (CAS) methods: compare_exchange_weak() may spuriously fail, compare_exchange_strong() does not.

Prefer compare_exchange_weak() for simple values, compare_exchange_strong() for complex or costly operations.

Avoid custom atomic types; standard std::atomic<T*> works for pointers.

Example CAS loop:

bool expected = false;
extern std::atomic<bool> b;
while (!b.compare_exchange_weak(expected, true) && !expected);

3. Synchronization and Memory Ordering

Three ordering categories are demonstrated.

3.1 Sequentially Consistent Ordering

Default ordering std::memory_order_seq_cst guarantees that all threads observe operations in program order. Example:

#include <atomic>
#include <thread>
#include <iostream>
std::atomic<bool> x, y;
void write_x(){ x.store(true, std::memory_order_seq_cst); }
void write_y(){ y.store(true, std::memory_order_seq_cst); }
void read_x_then_y(){
    while (!x.load(std::memory_order_seq_cst));
    if (y.load(std::memory_order_seq_cst)) std::cout<<"ok1
";
}
void read_y_then_x(){
    while (!y.load(std::memory_order_seq_cst));
    if (x.load(std::memory_order_seq_cst)) std::cout<<"ok2
";
}
int main(){
    x = y = false;
    std::thread a(write_x), b(write_y), c(read_x_then_y), d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
}

3.2 Relaxed Ordering

Relaxed atomics provide no inter‑thread ordering, only per‑variable happens‑before. Example demonstrates possible reordering and asserts.

#include <atomic>
#include <thread>
#include <assert.h>
std::atomic<bool> x,y;
std::atomic<int> z;
void write_x_then_y(){
    x.store(true, std::memory_order_relaxed);
    y.store(true, std::memory_order_relaxed);
}
void read_y_then_x(){
    while(!y.load(std::memory_order_relaxed));
    if (x.load(std::memory_order_relaxed)) ++z;
}
int main(){
    x = y = false; z = 0;
    std::thread a(write_x_then_y), b(read_y_then_x);
    a.join(); b.join();
    assert(z.load()!=0);
}

3.3 Acquire‑Release Ordering

Acquire reads synchronize with release writes, establishing a partial ordering. Example shows that write_x (release) happens‑before read_x_then_y (acquire), guaranteeing visibility of prior writes.

std::atomic<bool> x,y;
void write_x(){ x.store(true, std::memory_order_seq_release); }
void write_y(){ y.store(true, std::memory_order_seq_release); }
void read_x_then_y(){
    while(!x.load(std::memory_order_seq_acquire));
    if (y.load(std::memory_order_seq_acquire)) std::cout<<"ok1
";
}

3.4 Data Dependency (consume)

Consume ordering is weaker; it only orders dependent operations. Example with pointer p and data a shows that loading p with memory_order_consume ensures subsequent uses of the pointed‑to object see prior writes.

struct X{ int i_; std::string s_; };
std::atomic<int> a;
std::atomic<X*> p;
void create_x(){
    X* x = new X; x->i_ = 42; x->s_ = "hello";
    a.store(99, std::memory_order_relaxed);
    p.store(x, std::memory_order_release);
}
void use_x(){
    X* x;
    while(!(x = p.load(std::memory_order_consume)));
    assert(x->i_==42);
    assert(x->s_=="hello");
    assert(a.load(std::memory_order_relaxed)==99);
}

3.5 Fences

Memory fences create ordering across non‑atomic operations. Example shows that a release fence after storing x guarantees that a later acquire fence before loading x sees the store.

std::atomic<bool> x,y;
void write_x_then_y(){
    x.store(true, std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    y.store(true, std::memory_order_relaxed);
}
void read_y_then_x(){
    while(!y.load(std::memory_order_relaxed));
    std::atomic_thread_fence(std::memory_order_acquire);
    if (x.load(std::memory_order_relaxed)) std::cout<<"ok
";
}

Conclusion

The article is based on “C++ Concurrency In Action” and summarizes key concepts of the C++11 memory model, atomic types, and lock‑free programming techniques useful for high‑performance backend systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

concurrency multithreading Memory Model C++atomic-operations Lock-Free Programming

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.