Backend Development 25 min read

Designing a High‑Performance bytes.Buffer Pool for GWS in Go

This article explores the internal structure and growth mechanism of Go's bytes.Buffer, demonstrates how the GWS library replaces gorilla/websocket, and presents a high‑performance bytes.Buffer pool implementation using sync.Pool with power‑of‑two sizing to reduce allocations and improve concurrency in backend services.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Designing a High‑Performance bytes.Buffer Pool for GWS in Go

I am Lee, a veteran with 17 years of experience in the IT industry. After a long hiatus, I share my recent thoughts on the GWS project, a lightweight alternative to gorilla/websocket that offers lower code complexity and better performance.

GWS relies heavily on bytes.Buffer for handling byte streams. Understanding bytes.Buffer —its zero‑value semantics, internal fields ( buf , off , lastRead ), and how it grows—is essential for optimizing GWS.

bytes.Buffer definition

type Buffer struct {
    buf      []byte // contents are the bytes buf[off : len(buf)]
    off      int    // read at &buf[off], write at &buf[len(buf)]
    lastRead readOp // last read operation, so that Unread* can work correctly.
}

The Write method appends data, using tryGrowByReslice to attempt a cheap reslice and falling back to grow when necessary:

func (b *Buffer) Write(p []byte) (n int, err error) {
    b.lastRead = opInvalid
    m, ok := b.tryGrowByReslice(len(p))
    if !ok {
        m = b.grow(len(p))
    }
    return copy(b.buf[m:], p), nil
}

tryGrowByReslice checks whether the existing capacity can accommodate the new data; if not, grow allocates a larger slice, possibly sliding data or allocating a fresh buffer.

func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
    if l := len(b.buf); n <= cap(b.buf)-l {
        b.buf = b.buf[:l+n]
        return l, true
    }
    return 0, false
}

func (b *Buffer) grow(n int) int {
    m := b.Len()
    if m == 0 && b.off != 0 {
        b.Reset()
    }
    if i, ok := b.tryGrowByReslice(n); ok {
        return i
    }
    // allocate new slice, copy, etc.
    // (omitted for brevity)
    b.off = 0
    b.buf = b.buf[:m+n]
    return m
}

Frequent growth can cause memory allocations, data copies, and GC pressure, especially in high‑concurrency scenarios. Using sync.Pool to recycle bytes.Buffer instances mitigates these costs.

Simple sync.Pool buffer pool

var bufferPool = sync.Pool{
    New: func() interface{} {
        return &bytes.Buffer{make([]byte, 0, 1024)}
    },
}

func GetBuffer() *bytes.Buffer { return bufferPool.Get().(*bytes.Buffer) }
func PutBuffer(b *bytes.Buffer) { b.Reset(); bufferPool.Put(b) }

While adequate for many cases, this naïve pool may still allocate sub‑optimal capacities under heavy load. The GWS project introduces a more sophisticated pool that shards buffers by power‑of‑two capacities.

GWS buffer pool design

type BufferPool struct {
    begin  int
    end    int
    shards map[int]*sync.Pool // key = capacity (power of 2)
}

func NewBufferPool(left, right uint32) *BufferPool {
    begin, end := int(binaryCeil(left)), int(binaryCeil(right))
    p := &BufferPool{begin: begin, end: end, shards: make(map[int]*sync.Pool)}
    for i := begin; i <= end; i *= 2 {
        capacity := i
        p.shards[i] = &sync.Pool{New: func() interface{} { return bytes.NewBuffer(make([]byte, 0, capacity)) }}
    }
    return p
}

func (p *BufferPool) Get(n int) *bytes.Buffer {
    size := Max(int(binaryCeil(uint32(n))), p.begin)
    if pool, ok := p.shards[size]; ok {
        b := pool.Get().(*bytes.Buffer)
        if b.Cap() < size { b.Grow(size) }
        b.Reset()
        return b
    }
    return bytes.NewBuffer(make([]byte, 0, n))
}

func (p *BufferPool) Put(b *bytes.Buffer) {
    if b == nil { return }
    if pool, ok := p.shards[b.Cap()]; ok { pool.Put(b) }
}

func binaryCeil(v uint32) uint32 {
    v--
    v |= v >> 1
    v |= v >> 2
    v |= v >> 4
    v |= v >> 8
    v |= v >> 16
    v++
    return v
}

The pool works as follows:

Capacities are rounded up to the nearest power of two using binaryCeil , ensuring alignment and simplifying shard lookup.

Each shard holds buffers of a specific capacity, reducing memory waste.

Get returns a buffer of at least the requested size, growing it only when the pre‑allocated capacity is insufficient.

Put returns the buffer to the matching shard; buffers larger than the configured end are not pooled, avoiding uncontrolled growth.

Benchmarks comparing the naïve pool with the GWS pool show that the latter yields more stable capacity distributions and fewer allocations, especially when handling random payloads up to 65 KB.

Conclusion

The GWS bytes.Buffer pool demonstrates how careful capacity management, power‑of‑two sizing, and shard‑based sync.Pool usage can dramatically improve memory efficiency and throughput in high‑concurrency Go services. The design is simple, extensible, and well‑suited for backend systems that require fast, low‑latency WebSocket handling.

performanceGosync.Poolmemory poolbytes.Buffer
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.