Fundamentals 16 min read

Lock-Free & Wait-Free Programming: Core Techniques, Primitives, Java Stack Example

This article explains the challenges of multithreaded programming, compares blocking, lock‑free and wait‑free approaches, details essential atomic primitives, and provides a practical Java lock‑free stack implementation with best‑practice guidelines.

Cognitive Technology Team

Jul 22, 2025

Lock-Free & Wait-Free Programming: Core Techniques, Primitives, Java Stack Example

Handling multithreading is one of the most complex problems developers face. While many immediately resort to blocking approaches such as the synchronized keyword or ReentrantLock in Java, lock‑free programming offers an alternative way to achieve concurrency.

This article introduces common issues, techniques, and best practices of lock‑free programming, presents a practical lock‑free stack implementation in Java, and outlines the transition to wait‑free programming.

Lock-Free Programming Basics

Guarantee : Lock‑free algorithms ensure system‑wide progress – at least one thread completes an operation within a finite number of steps.

Typical techniques : Atomic primitives such as Compare‑and‑Swap (CAS), Load‑Link/Store‑Conditional (LL/SC), and Fetch‑and‑Add (FAA).

Advantages :

Scales well under high contention with no kernel blocking or context‑switch overhead.

Eliminates deadlock and livelock problems.

Disadvantages :

Higher design and comprehension difficulty.

Higher cost than simple mutexes under low contention.

Potential starvation: a thread may repeatedly fail while others succeed, though overall progress continues.

What Is Wait‑Free Programming?

Guarantee : Wait‑free algorithms provide a per‑thread progress bound; each thread finishes its operation in a finite number of its own steps, regardless of other threads.

Typical techniques : Per‑thread operation descriptors and bounded‑step helper functions.

Advantages :

All benefits of lock‑free programming.

Suitable for real‑time systems.

Eliminates starvation.

Disadvantages :

Requires significantly more memory and bookkeeping.

Often slower on the “ideal path” with no contention.

Creating wait‑free implementations for complex data structures can be extremely difficult or impossible.

Blocking vs. Lock‑Free vs. Wait‑Free: Progress Guarantees Comparison

Deadlock : Impossible in both lock‑free and wait‑free algorithms because no thread holds a lock while waiting.

Livelock : Prevented in lock‑free and wait‑free algorithms; at least one operation must complete within bounded steps.

Starvation : Possible in lock‑free algorithms (a thread may be repeatedly overtaken), but impossible in wait‑free algorithms due to per‑thread bounds.

Atomic Primitives Supporting Non‑Blocking Algorithms

Compare‑and‑Swap (CAS)

CAS is the most famous atomic instruction. In a single atomic step it:

Reads a memory location.

Checks that the location still holds the expected value.

If the comparison succeeds, writes a new value.

Most non‑blocking data structures rely on CAS loops. The classic lock‑free stack implementation uses CAS and suffers from the ABA problem under high contention.

Load‑Link / Store‑Conditional (LL/SC)

LL/SC solves the ABA problem by reserving a cache line during the load phase. The store succeeds only if the reservation is still valid.

In Java the closest analogue is AtomicStampedReference.

Fetch‑and‑Add (FAA)

FAA atomically reads a value, adds an increment k, stores the result, and returns the old value (or new value in some variants).

It is useful for counters, ticket locks, and reference counting, but only supports simple arithmetic old + k. More complex updates still require CAS or LL/SC.

Double‑Width CAS (DW‑CAS)

DW‑CAS extends CAS to operate on two adjacent machine words (e.g., a pointer and a tag) as a single 128‑bit unit, preventing ABA when both parts must match.

Its main drawback is higher micro‑architectural cost due to locking a 16‑byte cache line.

Primitives Summary

CAS : General form old ⇒ new; used as a universal lock‑free building block; pitfalls include the ABA problem and high retry cost under contention.

DW‑CAS : Form (old_lo, old_hi) ⇒ (new_lo, new_hi); enables pointer‑plus‑tag pairs for ABA defense; drawback is higher cache‑line lock cost.

LL/SC : Form LL; …; SC; portable lock‑free alternative when CAS is unavailable; pitfall is loss of reservation leading to extra retries.

FAA : Form new = old + k; used for counters and tickets; pitfalls include cache‑line ping‑pong effects and limitation to arithmetic operations.

Lock‑Free Programming in Java: Lock‑Free Stack Implementation

Basics

The simplest lock‑free stack uses a single atomic top pointer. Push and pop operations perform CAS loops on this pointer. The node class contains only a next reference.

import java.util.concurrent.atomic.AtomicReference;

private record Node<E>(E value, Node<E> next) {}

private final AtomicReference<Node<E>> top = new AtomicReference<>();

Lock‑Free Push

Push uses a CAS loop around the top pointer:

/* ---------------- Push ----------------------- */
public void push(E item) {
    Node<E> newNode;
    Node<E> oldTop;
    do {
        oldTop = top.get();
        newNode = new Node<>(item, oldTop);
    } while (!top.compareAndSet(oldTop, newNode));
}

Steps:

Read the current head pointer.

Create a new node (safe because other threads cannot see it yet).

Attempt CAS from oldTop to newNode.

If another thread modified the top, the CAS fails and the loop retries.

Lock‑Free Pop

public E pop() {
    Node<E> oldTop;
    Node<E> newTop;
    do {
        oldTop = top.get();
        if (oldTop == null) return null;
        newTop = oldTop.next();
    } while (!top.compareAndSet(oldTop, newTop));
    return oldTop.value();
}

Steps:

Read the current head pointer.

If it is null, the stack is empty.

Compute the new top ( oldTop.next()).

CAS from oldTop to newTop; on failure, retry.

Return the popped value; the old node becomes eligible for garbage collection.

From Lock‑Free to Wait‑Free Programming

Full wait‑free implementations are lengthy and complex, but the following patterns are common:

Operation log: each thread writes a small record describing its intended operation.

Per‑thread slot reservation: threads obtain a unique index via FAA to write into the log.

Phase (ticket) numbers: each request receives a monotonically increasing phase identifier.

Bounded help loops: a helper thread assists at most k external operations before returning to its own work.

Useful resources include scalable synchronous queues, wait‑free queue papers, wait‑free stack designs, and general wait‑free algorithm surveys.

Conclusion

Both lock‑free and wait‑free programming are advanced techniques, but with the right tools and design patterns they become manageable.

Key Takeaways

Definitions

Lock‑free programming: in all active threads, at least one makes progress in a finite number of steps.

Wait‑free programming: every thread’s operation completes within a bounded number of its own steps.

Atomic Primitives

CAS

LL/SC

FAA

Double‑Width CAS

Best Practices for Non‑Blocking Structures

Design first, code later – draw state diagrams and identify all concurrent mutation points.

One atomic word per logical invariant – ensure each CAS/LL‑SC covers the full state you need to test.

Prefer DCAS over splitting invariants across multiple variables.

Write a fast path that tries a single CAS; after N failures, publish a descriptor and switch to a helping (slow) path.

Bounded helping for wait‑free designs – limit helpers to completing at most k external operations.

Java lock‑free atomic-primitives wait-free

Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.