Fundamentals 15 min read

How to Build a Lock‑Free Queue Using CAS: Step‑by‑Step Guide

This article explains the fundamentals of lock‑free queues, covering CAS and other atomic operations, detailed enqueue and dequeue implementations, the ABA problem and its solutions, as well as array‑based lock‑free queue designs, providing code examples and practical insights.

21CTO
21CTO
21CTO
How to Build a Lock‑Free Queue Using CAS: Step‑by‑Step Guide

About CAS and Other Atomic Operations

Before discussing lock‑free queues, it is essential to understand the Compare‑and‑Set (CAS) operation, which is supported by almost all CPU instruction sets (e.g., CMPXCHG on x86) and enables the creation of lock‑free data structures.

int compare_and_swap(int* reg, int oldval, int newval) {
    int old_reg_val = *reg;
    if (old_reg_val == oldval) {
        *reg = newval;
    }
    return old_reg_val;
}

CAS can also be expressed as a boolean‑returning function to indicate success:

bool compare_and_swap(int* addr, int oldval, int newval) {
    if (*addr != oldval) {
        return false;
    }
    *addr = newval;
    return true;
}

Related atomic primitives include Fetch‑And‑Add, Test‑and‑Set, and Test‑and‑Test‑and‑Set.

1) GCC CAS

bool __sync_bool_compare_and_swap(type *ptr, type oldval, type newval, ...)
type __sync_val_compare_and_swap(type *ptr, type oldval, type newval, ...)

2) Windows CAS

InterlockedCompareExchange(LONG volatile *Target, LONG Exchange, LONG Comperand);

3) C++11 CAS

template<class T>
bool atomic_compare_exchange_weak(std::atomic<T>* obj, T* expected, T desired);

Lock‑Free Queue Linked‑List Implementation

Initialization creates a dummy node to simplify boundary checks:

InitQueue(Q) {
    node = new node();
    node->next = NULL;
    Q->head = Q->tail = node;
}

Enqueue uses a two‑step CAS process: first link the new node to tail->next, then swing tail to the new node.

EnQueue(Q, data) {
    n = new node();
    n->value = data;
    n->next = NULL;
    do {
        p = Q->tail;
    } while (CAS(p->next, NULL, n) != TRUE);
    CAS(Q->tail, p, n);
}

A refined version reduces redundant fetches by sharing the global tail pointer among threads.

EnQueue(Q, data) {
    n = new node();
    n->value = data;
    n->next = NULL;
    while (TRUE) {
        tail = Q->tail;
        next = tail->next;
        if (tail != Q->tail) continue;
        if (next != NULL) {
            CAS(Q->tail, tail, next);
            continue;
        }
        if (CAS(tail->next, next, n) == TRUE) break;
    }
    CAS(Q->tail, tail, n);
}

Dequeue operates on head->next to avoid the empty‑queue edge case, using a CAS loop to swing the head pointer.

DeQueue(Q) {
    do {
        p = Q->head;
        if (p->next == NULL) return ERR_EMPTY_QUEUE;
    } while (CAS(Q->head, p, p->next) != TRUE);
    return p->next->value;
}

An improved Dequeue adds extra checks to handle the situation where head and tail point to the same dummy node while another thread is midway through an enqueue.

DeQueue(Q) {
    while (TRUE) {
        head = Q->head;
        tail = Q->tail;
        next = head->next;
        if (head != Q->head) continue;
        if (head == tail && next == NULL) return ERR_EMPTY_QUEUE;
        if (head == tail && next == NULL) {
            CAS(Q->tail, tail, next);
            continue;
        }
        if (CAS(Q->head, head, next) == TRUE) {
            value = next->value;
            break;
        }
    }
    free(head);
    return value;
}

CAS ABA Problem

The ABA problem occurs when a location is changed from value A to B and back to A, making a CAS operation believe nothing changed, which can lead to subtle bugs in lock‑free algorithms.

Solving the ABA Problem

One solution is double‑CAS, checking both the value and a counter (or version tag) atomically, ensuring that even if the value returns to A, the counter differs.

Another approach used in lock‑free queues is reference counting on nodes, preventing reclaimed memory from being reused while still in use.

SafeRead(q) {
    loop:
        p = q->next;
        if (p == NULL) return p;
        Fetch&Add(p->refcnt, 1);
        if (p == q->next) return p;
        else Release(p);
    goto loop;
}

Array‑Based Lock‑Free Queue

An array (ring buffer) implementation avoids dynamic memory allocation. Elements can be in states HEAD, TAIL, EMPTY, or hold actual data. Enqueue uses double‑CAS to replace (TAIL, EMPTY) with (data, TAIL); dequeue swaps (HEAD, data) with (EMPTY, HEAD).

Two counters, incremented atomically with Fetch‑And‑Add, track the number of enqueues and dequeues, and their modulo with the array size yields the current head and tail positions.

Conclusion

Lock‑free queues rely on CAS, FAA, and retry loops to achieve concurrency without traditional locks. Understanding the ABA problem and its mitigations, as well as alternative designs such as array‑based queues, equips developers to build efficient concurrent data structures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CASlock‑freeatomic operationsABA problemconcurrent queue
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.