How to Build an Ultra‑Fast Ring Buffer for Producer‑Consumer in Java
This article explains a high‑performance ring buffer implementation for a multi‑threaded producer‑consumer model in Java, covering design choices, atomic index handling, benchmark results, and further optimizations such as cache‑line padding and multi‑buffer sharding.
Background
In a multi‑threaded producer‑consumer model, the requirements are:
Very high performance for producers delivering data.
Multiple producers and a single (or multiple) consumer(s).
When the consumer cannot keep up, a small amount of data loss is tolerable.
Producers generate data one item at a time.
For example, in a log‑collection scenario, logs are produced on different threads at a rate far exceeding the consumer; discarding some logs while minimizing logging overhead calls for an ultra‑fast buffering queue.
Implementation Details
Multiple producers submit messages to a bounded buffer. Ignoring thread safety would give the highest performance, but data would be overwritten. Since occasional data loss is acceptable only when the consumer lags, the buffer must be bounded; when full, typical strategies are:
Block until consumption.
Overwrite old data.
Discard new data.
To minimize producer overhead, overwriting is usually chosen.
Ring Buffer
A ring buffer (circular queue) implemented with an array solves the bounded‑buffer and overwrite‑strategy problems. By ensuring the producer’s index acquisition is thread‑safe, the array’s pre‑allocated contiguous memory yields excellent performance.
AtomicInteger
To obtain a thread‑safe index in the ring buffer, AtomicInteger is used, leading to the following helper class:
public class AtomicRangeInteger extends Number {
private final AtomicInteger value;
private final int startValue;
private final int endValue;
public AtomicRangeInteger(int startValue, int endValue) {
this.startValue = startValue;
this.endValue = endValue;
this.value = new AtomicInteger(startValue);
}
public final int incrementAndGet() {
int next;
do {
next = value.incrementAndGet();
if (next > endValue && value.compareAndSet(next, startValue)) {
return startValue;
}
} while (next > endValue);
return next;
}
public final int get() { return value.intValue(); }
@Override public int intValue() { return value.intValue(); }
@Override public long longValue() { return value.intValue(); }
@Override public float floatValue() { return value.intValue(); }
@Override public double doubleValue() { return value.intValue(); }
}The core ring buffer implementation is:
public final class RingBuffer<T> {
private int bufferSize;
private AtomicRangeInteger index;
private final T[] buffer;
@SuppressWarnings("unchecked")
public RingBuffer(int bufferSize) {
this.bufferSize = bufferSize;
this.index = new AtomicRangeInteger(0, bufferSize);
this.buffer = (T[]) new Object[bufferSize];
}
public final void offer(final T data) {
buffer[index.incrementAndGet()] = data;
}
public final T poll(int index) {
T tmp = buffer[index];
buffer[index] = null;
return tmp;
}
public int getBufferSize() { return bufferSize; }
}The essential method for index acquisition is:
public final int incrementAndGet() {
int next;
do {
next = value.incrementAndGet();
if (next > endValue && value.compareAndSet(next, startValue)) {
return startValue;
}
} while (next > endValue);
return next;
}The producer obtains the next free index atomically via incrementAndGet.
If the returned index exceeds the buffer size, it is wrapped to startValue using compareAndSet, retrying if another thread intervenes.
Why It’s Ultra‑Fast
An open‑source ring buffer implementation called Disruptor uses batch insertion and a compare‑and‑set strategy, blocking when the buffer is full and requiring the capacity to be a power of two. Our implementation replaces compareAndSet with incrementAndGet, yielding about three times higher throughput in benchmarks (≈40 M ops/s vs 15 M ops/s).
Benchmark results:
Benchmark Mode Cnt Score Error Units
RingBufferBenchmark.testV0 thrpt 2 39969002.156 ops/s
RingBufferBenchmark.testV1 thrpt 2 15533576.961 ops/sThe difference stems from the underlying implementation of incrementAndGet. In JDK 8+, it may use a native fetch‑and‑add CPU instruction when available, which is far faster than a Java‑level CAS loop.
Unsafe is specially handled; if the platform supports fetch‑and‑add, getAndAddInt executes a native instruction; otherwise it falls back to a CAS‑based loop.
On JDK 7 the performance gap disappears because fetch‑and‑add is not used.
Further Optimization Opportunities
Cache‑line padding: The three fields in AtomicRangeInteger cause false sharing. Adding the @Contended annotation pads the frequently updated value field, improving throughput.
public class AtomicRangeIntegerV2 extends Number {
@Contended
protected final AtomicInteger value;
protected final int startValue;
protected final int endValue;
...
}Benchmark with @Contended (v2) shows a further increase (≈72 M ops/s vs 44 M ops/s).
Benchmark Mode Cnt Score Error Units
RingBufferBenchmark.testV2 thrpt 2 72095754.040 ops/s
RingBufferBenchmark.testV0 thrpt 2 44360926.943 ops/sMultiple ring buffers to reduce contention: Distribute producers across several buffers, ideally one buffer per thread, similar to techniques used in high‑performance counters.
Details on sharding strategies can be found in the author’s earlier article about building a faster counter than LongAdder.
Easter Egg
The ring buffer implementation was inspired by SkyWalking’s version, which originally used CAS and did not meet performance expectations. After applying the optimizations above, the author contributed the improved code to SkyWalking (see GitHub pull requests 2874 and 2930), and it is now part of the project.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
