Why LongAdder Beats AtomicLong in High Contention: Deep Dive into Java’s Concurrent Counter

This article explains how java.util.concurrent.atomic.LongAdder provides a more efficient counting mechanism than AtomicLong under high contention, discusses its trade‑offs in space and precision, presents JMH benchmark results, and walks through the internal implementation details such as cells, CAS loops, and hash‑based distribution.

Ziru Technology
Ziru Technology
Ziru Technology
Why LongAdder Beats AtomicLong in High Contention: Deep Dive into Java’s Concurrent Counter

Introduction

In single‑machine concurrent scenarios we often need to count or accumulate values safely. The usual approaches are synchronized blocks or atomic classes like AtomicLong. The class java.util.concurrent.atomic.LongAdder offers a more efficient solution for these cases.

LongAdder achieves higher throughput at the cost of increased memory usage and the inability to guarantee 100% precise results.

Using LongAdder

Counting is performed with the add() method and the accumulated result is obtained with sum():

LongAdder longAdder = new LongAdder();
longAdder.add(10);
longAdder.add(5);
System.out.println(longAdder.sum()); // prints "15"

Performance Comparison: AtomicLong vs LongAdder

The JMH benchmark below compares the throughput of LongAdder and AtomicLong using ten concurrent threads:

/**
 * Benchmark Mode Cnt Score Error Units
 * MyBenchmark.concurrentAtomicLong thrpt 5 44616374.867 ± 9345004.297 ops/s
 * MyBenchmark.concurrentLongAdder thrpt 5 234633786.245 ± 49509534.208 ops/s
 */
public class MyBenchmark {
    static LongAdder adder = new LongAdder();
    static AtomicLong atomicLong = new AtomicLong();

    @Benchmark @Threads(10)
    public void concurrentLongAdder(){
        adder.add(1);
    }

    @Benchmark @Threads(10)
    public void concurrentAtomicLong(){
        atomicLong.incrementAndGet();
    }
}

The result shows that LongAdder’s write performance is about five times that of AtomicLong under high contention.

Problems with AtomicLong

AtomicLong updates its value using sun.misc.Unsafe.getAndAddLong(), which internally performs a compare‑and‑swap (CAS) loop. As the number of competing threads grows, the probability of CAS retries increases, consuming more CPU resources and eventually becoming a bottleneck. In extreme contention, CAS can be slower than a simple synchronized lock.

Optimization Idea of LongAdder

LongAdder reduces contention by splitting the accumulated value into multiple Cell objects. Each thread hashes to a specific cell, thus spreading the updates. Reading the total requires summing the base value and all cell values.

LongAdder cell distribution diagram
LongAdder cell distribution diagram

Implementation Details

Key questions when reading the source:

What members does the class have and what are their roles?

What is the overall flow of the add operation?

How is the target cell for each thread chosen?

What mechanisms ensure thread safety?

When does the cell array expand?

Member Variables

volatile long base

– the value used when there is no contention (similar to AtomicLong’s value). volatile Cell[] cells – lazily initialized array of cells; length is a power of two and expands up to the number of CPU cores. volatile int cellsBusy – a simple lock (0/1) used to serialize initialization and resizing of the cells array.

Cell Class

@sun.misc.Contended static final class Cell {
    volatile long value;
    Cell(long x) { value = x; }
    final boolean cas(long cmp, long val) {
        return UNSAFE.compareAndSwapLong(this, valueOffset, cmp, val);
    }
}

The @Contended annotation prevents false sharing between cells.

longAccumulate Method (in Striped64)

The core logic of LongAdder resides in the parent class Striped64. The method repeatedly attempts to update the appropriate cell using CAS; on failure it may resize the cells array or fall back to updating the base value.

final void longAccumulate(long x, LongBinaryOperator fn, boolean wasUncontended) {
    int h;
    if ((h = getProbe()) == 0) {
        ThreadLocalRandom.current();
        h = getProbe();
        wasUncontended = true;
    }
    boolean collide = false;
    for (;;) {
        Cell[] as; Cell a; int n; long v;
        if ((as = cells) != null && (n = as.length) > 0) {
            if ((a = as[(n - 1) & h]) == null) {
                // try to create a new Cell
                ...
            } else if (!wasUncontended) {
                wasUncontended = true;
            } else if (a.cas(v = a.value, (fn == null) ? v + x : fn.applyAsLong(v, x))) {
                break;
            } else if (n >= NCPU || cells != as) {
                collide = false;
            } else if (!collide) {
                collide = true;
            } else if (cellsBusy == 0 && casCellsBusy()) {
                // expand cells array
                ...
                collide = false;
                continue;
            }
            h = advanceProbe(h);
        } else if (cellsBusy == 0 && casCellsBusy()) {
            // initialize cells array
            ...
            continue;
        } else if (casBase(v = base, (fn == null) ? v + x : fn.applyAsLong(v, x))) {
            break;
        }
    }
}

cellsBusy CAS

final boolean casCellsBusy() {
    return UNSAFE.compareAndSwapInt(this, CELLSBUSY, 0, 1);
}

add() and sum() Methods

public void add(long x) {
    // delegates to longAccumulate(x, null, false)
}

public long sum() {
    Cell[] as = cells;
    long sum = base;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            Cell a = as[i];
            if (a != null) sum += a.value;
        }
    }
    return sum;
}

Both base and Cell.value are volatile, guaranteeing visibility across threads.

Limitations of LongAdder

It does not support incrementAndGet() because the value is spread across multiple cells, making an atomic read‑modify‑write impossible. sum() may return a value that never existed at any single point in time, as concurrent updates can interleave with the summation.

Read‑heavy workloads suffer because reading requires aggregating many cells, whereas LongAdder is optimized for write‑heavy scenarios.

Despite these drawbacks, LongAdder is intentionally designed for high‑contention counting where occasional inexact reads are acceptable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformanceconcurrencyatomiclongCASlongadderJMH
Ziru Technology
Written by

Ziru Technology

Ziru Official Tech Account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.