Fundamentals 11 min read

How x86 and ARM Achieve Atomic Operations: LOCK Prefix and Exclusive Access

This article explains which instruction sets support atomic operations, detailing the principles behind x86’s LOCK prefix and ARM’s exclusive load/store mechanisms, illustrating how single‑processor and SMP systems handle atomicity, and showing Linux kernel implementations for atomic increment and compare‑and‑swap.

MaGe Linux Operations

Jun 30, 2022

How x86 and ARM Achieve Atomic Operations: LOCK Prefix and Exclusive Access

Preface

This was a question encountered in an interview that the author could not answer at the time; after researching, they documented the findings.

What instruction sets support atomic operations? (x86 and ARM)

Atomic operations are indivisible; on a uniprocessor any single instruction can be considered atomic because interrupts occur only between instructions.

For example, a simple C increment (shown in the original image) can cause concurrency problems when multiple processes execute it simultaneously.

In a uniprocessor system the solution is to translate the count++ statement into a single‑instruction atomic operation.

x86 Architecture

Intel x86 provides the LOCK prefix, which locks the front‑side bus (FSB) to ensure the instruction executes without interference from other processors in a multiprocessor environment.

Example of using the LOCK prefix (illustrated in the original image).

The LOCK prefix can be applied only to specific instructions (ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, XCHG). The Intel manual describes it as follows:

Description Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted. The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be generated. An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not in the above list. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix. The LOCK prefix is typically used with the BTS instruction to perform a read‑modify‑write operation on a memory location in a shared memory environment. The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.

Operating System Implementation (Linux)

Linux source defines atomic increment using LOCK_PREFIX. (Illustrated in the original image.)

The definition of LOCK_PREFIX shows that on symmetric multiprocessor (SMP) systems it expands to the lock prefix, while on a uniprocessor it expands to nothing.

Implementation of compare‑and‑swap (cmpxchg) in Linux:

static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new) {
    return cmpxchg(&v->counter, old, new);
}
#define cmpxchg(ptr, old, new) \
    __cmpxchg(ptr, old, new, sizeof(*(ptr)))
#define __cmpxchg(ptr, old, new, size) \
    __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
#define __raw_cmpxchg(ptr, old, new, size, lock) \
({
    __typeof__(*(ptr)) __ret;
    __typeof__(*(ptr)) __old = (old);
    __typeof__(*(ptr)) __new = (new);
    switch (size) {
    case __X86_CASE_B: {
        volatile u8 *__ptr = (volatile u8 *)(ptr);
        asm volatile(lock "cmpxchgb %2,%1"
        : "=a" (__ret), "+m" (*__ptr)
        : "q" (__new), "0" (__old)
        : "memory");
        break;
    }
    case __X86_CASE_W: {
        volatile u16 *__ptr = (volatile u16 *)(ptr);
        asm volatile(lock "cmpxchgw %2,%1"
        : "=a" (__ret), "+m" (*__ptr)
        : "r" (__new), "0" (__old)
        : "memory");
        break;
    }
    case __X86_CASE_L: {
        volatile u32 *__ptr = (volatile u32 *)(ptr);
        asm volatile(lock "cmpxchgl %2,%1"
        : "=a" (__ret), "+m" (*__ptr)
        : "r" (__new), "0" (__old)
        : "memory");
        break;
    }
    case __X86_CASE_Q: {
        volatile u64 *__ptr = (volatile u64 *)(ptr);
        asm volatile(lock "cmpxchgq %2,%1"
        : "=a" (__ret), "+m" (*__ptr)
        : "r" (__new), "0" (__old)
        : "memory");
        break;
    }
    default:
        __cmpxchg_wrong_size();
    }
    __ret;
})

ARM Architecture

Older ARM architectures (pre‑ARMv6) did not support SMP; atomicity was achieved by disabling local interrupts.

Linux ARM source shows similar atomic operations (illustrated in the original image).

For compare‑and‑swap, the operation on v->counter forms a critical section that must not be interrupted.

Before ARMv6, the SWP instruction performed an atomic read‑modify‑write, but it was deprecated because it blocked all processors from accessing memory during the operation, harming performance on multi‑core systems.

Since ARMv6, exclusive load/store primitives LDREX and STREX replace SWP, providing flexible atomic updates. These are available in both ARM and Thumb instruction sets from ARMv6T2 onward.

LDREX reads a word exclusively; STREX attempts to store a word, succeeding only if the exclusive monitor still permits it, returning 0 on success and 1 on failure.

Conclusion

The article ends by asking the reader whether they now know the answer to the original interview question.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

x86 Arm atomic-operations Instruction Set lock prefix compare-and-swap

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.