Why Go’s lock‑free PoolDequeue outperforms channels by 10×
This article examines Go’s internal lock‑free single‑producer multi‑consumer queues—PoolDequeue and its dynamic extension PoolChain—detailing their design, atomic operations, and benchmark comparisons that show they can be up to ten times faster than standard channels in a producer‑consumer workload.
Producer‑Consumer Modes
The classic producer‑consumer pattern can be classified into four modes:
SPSC – single‑producer single‑consumer
SPMC – single‑producer multiple‑consumer
MPSC – multiple‑producer single‑consumer
MPMC – multiple‑producer multiple‑consumer
Go's channel implements the MPMC mode, while sync.Pool uses a lock‑free structure optimized for the SPMC case.
PoolDequeue
PoolDequeueis a fixed‑size lock‑free queue built on a ring buffer. It requires exactly one producer; any number of consumers may operate concurrently.
Internal representation:
type poolDequeue struct {
headTail atomic.Uint64 // high 32 bits = head, low 32 bits = tail
vals []eface // length must be a power of two
}The headTail field packs the head index in the high 32 bits and the tail index in the low 32 bits. Because len(vals) is a power of two, modulo operations are performed with a bitwise & instead of the slower %.
Producer operation – pushHead
func (d *poolDequeue) pushHead(val any) bool {
ptrs := d.headTail.Load()
head, tail := d.unpack(ptrs)
// full when advancing head would meet tail
if (tail+uint32(len(d.vals)))&(1<<dequeueBits-1) == head {
return false // queue full
}
slot := &d.vals[head&uint32(len(d.vals)-1)]
// slot must be empty (type pointer nil) before we write
if typ := atomic.LoadPointer(&slot.typ); typ != nil {
return false // another goroutine is cleaning tail
}
if val == nil {
val = dequeueNil(nil) // special sentinel for nil
}
*(*any)(unsafe.Pointer(slot)) = val // ① write value
d.headTail.Add(1 << dequeueBits) // store barrier, advance head
return true
}Only the single producer writes to a slot, so the write at ① is safe without additional synchronization.
Consumer operation – popTail
func (d *poolDequeue) popTail() (any, bool) {
var slot *eface
for {
ptrs := d.headTail.Load()
head, tail := d.unpack(ptrs)
if tail == head {
return nil, false // empty queue
}
// attempt to claim the next tail position
ptrs2 := d.pack(head, tail+1)
if d.headTail.CompareAndSwap(ptrs, ptrs2) {
slot = &d.vals[tail&uint32(len(d.vals)-1)]
break
}
// spin‑loop (②) – may become a bottleneck under heavy contention
}
val := *(*any)(unsafe.Pointer(slot))
if val == dequeueNil(nil) {
val = nil
}
slot.val = nil // ③ release value
atomic.StorePointer(&slot.typ, nil) // ④ release type pointer
return val, true
}The CAS in the loop atomically increments the tail; after a successful claim the consumer reads the value and clears the slot (③, ④) so the producer can reuse it.
Producer‑only operation – popHead
popHeadmirrors popTail but decrements the head index, allowing the producer to retrieve an element it has pushed but that has not yet been consumed.
PoolChain
PoolChainbuilds an unbounded queue by chaining multiple PoolDequeue instances. The head side is accessed only by the producer, the tail side only by consumers, and the linking pointers are updated atomically.
type poolChain struct {
head *poolChainElt // producer‑only, no synchronization
tail atomic.Pointer[poolChainElt] // consumer‑only, atomic ops
}
type poolChainElt struct {
poolDequeue // embedded fixed‑size deque
next, prev atomic.Pointer[poolChainElt] // next written by producer, read by consumers; prev written by consumers, read by producer
}When the consumer's current poolDequeue becomes empty, it moves to the next element in the chain. When the producer's head deque fills, a new poolDequeue is allocated and linked, giving the overall structure unlimited capacity.
Full implementation can be inspected at:
https://github.com/golang/go/blob/master/src/sync/poolqueue.go#L220-L302
Benchmark comparison with channel
Benchmarks compare a single producer against ten concurrent consumers for three queue types: PoolDequeue, PoolChain, and a buffered channel. The benchmark code:
func BenchmarkPoolDequeue(b *testing.B) {
const size = 1024
pd := NewPoolDequeue(size)
var wg sync.WaitGroup
wg.Add(11) // 1 producer + 10 consumers
go func() {
for i := 0; i < b.N; i++ {
pd.PushHead(i)
}
wg.Done()
}()
for i := 0; i < 10; i++ {
go func() {
for {
if _, ok := pd.PopTail(); !ok {
break
}
}
wg.Done()
}()
}
wg.Wait()
}
func BenchmarkPoolChain(b *testing.B) {
pc := NewPoolChain()
var wg sync.WaitGroup
wg.Add(11)
go func() {
for i := 0; i < b.N; i++ {
pc.PushHead(i)
}
wg.Done()
}()
for i := 0; i < 10; i++ {
go func() {
for {
if _, ok := pc.PopTail(); !ok {
break
}
}
wg.Done()
}()
}
wg.Wait()
}
func BenchmarkChannel(b *testing.B) {
ch := make(chan interface{}, 1024)
var wg sync.WaitGroup
wg.Add(11)
go func() {
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
wg.Done()
}()
for i := 0; i < 10; i++ {
go func() {
for range ch {
}
wg.Done()
}()
}
wg.Wait()
}Results (averaged over multiple runs) show: PoolDequeue ≈ 10× faster than the buffered channel. PoolChain ≈ 5× faster than the channel but about half the throughput of PoolDequeue because of the overhead of managing the linked list of deques.
The accompanying chart visualises the speedup:
These measurements confirm that Go's internal lock‑free queues provide a substantial latency and throughput advantage for the single‑producer multi‑consumer pattern, and that PoolChain offers a scalable alternative when unbounded capacity is required.
For experimentation, the author duplicated the implementation in a separate repository with unit tests and additional benchmarks:
https://github.com/smallnest/exp/blob/master/gods/poolqueue.go
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
