Designing a High‑Performance bytes.Buffer Pool for GWS in Go
This article explores the internal structure and growth mechanism of Go's bytes.Buffer, demonstrates how the GWS library replaces gorilla/websocket, and presents a high‑performance bytes.Buffer pool implementation using sync.Pool with power‑of‑two sizing to reduce allocations and improve concurrency in backend services.
I am Lee, a veteran with 17 years of experience in the IT industry. After a long hiatus, I share my recent thoughts on the GWS project, a lightweight alternative to gorilla/websocket that offers lower code complexity and better performance.
GWS relies heavily on bytes.Buffer for handling byte streams. Understanding bytes.Buffer —its zero‑value semantics, internal fields ( buf , off , lastRead ), and how it grows—is essential for optimizing GWS.
bytes.Buffer definition
type Buffer struct {
buf []byte // contents are the bytes buf[off : len(buf)]
off int // read at &buf[off], write at &buf[len(buf)]
lastRead readOp // last read operation, so that Unread* can work correctly.
}The Write method appends data, using tryGrowByReslice to attempt a cheap reslice and falling back to grow when necessary:
func (b *Buffer) Write(p []byte) (n int, err error) {
b.lastRead = opInvalid
m, ok := b.tryGrowByReslice(len(p))
if !ok {
m = b.grow(len(p))
}
return copy(b.buf[m:], p), nil
}tryGrowByReslice checks whether the existing capacity can accommodate the new data; if not, grow allocates a larger slice, possibly sliding data or allocating a fresh buffer.
func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
if l := len(b.buf); n <= cap(b.buf)-l {
b.buf = b.buf[:l+n]
return l, true
}
return 0, false
}
func (b *Buffer) grow(n int) int {
m := b.Len()
if m == 0 && b.off != 0 {
b.Reset()
}
if i, ok := b.tryGrowByReslice(n); ok {
return i
}
// allocate new slice, copy, etc.
// (omitted for brevity)
b.off = 0
b.buf = b.buf[:m+n]
return m
}Frequent growth can cause memory allocations, data copies, and GC pressure, especially in high‑concurrency scenarios. Using sync.Pool to recycle bytes.Buffer instances mitigates these costs.
Simple sync.Pool buffer pool
var bufferPool = sync.Pool{
New: func() interface{} {
return &bytes.Buffer{make([]byte, 0, 1024)}
},
}
func GetBuffer() *bytes.Buffer { return bufferPool.Get().(*bytes.Buffer) }
func PutBuffer(b *bytes.Buffer) { b.Reset(); bufferPool.Put(b) }While adequate for many cases, this naïve pool may still allocate sub‑optimal capacities under heavy load. The GWS project introduces a more sophisticated pool that shards buffers by power‑of‑two capacities.
GWS buffer pool design
type BufferPool struct {
begin int
end int
shards map[int]*sync.Pool // key = capacity (power of 2)
}
func NewBufferPool(left, right uint32) *BufferPool {
begin, end := int(binaryCeil(left)), int(binaryCeil(right))
p := &BufferPool{begin: begin, end: end, shards: make(map[int]*sync.Pool)}
for i := begin; i <= end; i *= 2 {
capacity := i
p.shards[i] = &sync.Pool{New: func() interface{} { return bytes.NewBuffer(make([]byte, 0, capacity)) }}
}
return p
}
func (p *BufferPool) Get(n int) *bytes.Buffer {
size := Max(int(binaryCeil(uint32(n))), p.begin)
if pool, ok := p.shards[size]; ok {
b := pool.Get().(*bytes.Buffer)
if b.Cap() < size { b.Grow(size) }
b.Reset()
return b
}
return bytes.NewBuffer(make([]byte, 0, n))
}
func (p *BufferPool) Put(b *bytes.Buffer) {
if b == nil { return }
if pool, ok := p.shards[b.Cap()]; ok { pool.Put(b) }
}
func binaryCeil(v uint32) uint32 {
v--
v |= v >> 1
v |= v >> 2
v |= v >> 4
v |= v >> 8
v |= v >> 16
v++
return v
}The pool works as follows:
Capacities are rounded up to the nearest power of two using binaryCeil , ensuring alignment and simplifying shard lookup.
Each shard holds buffers of a specific capacity, reducing memory waste.
Get returns a buffer of at least the requested size, growing it only when the pre‑allocated capacity is insufficient.
Put returns the buffer to the matching shard; buffers larger than the configured end are not pooled, avoiding uncontrolled growth.
Benchmarks comparing the naïve pool with the GWS pool show that the latter yields more stable capacity distributions and fewer allocations, especially when handling random payloads up to 65 KB.
Conclusion
The GWS bytes.Buffer pool demonstrates how careful capacity management, power‑of‑two sizing, and shard‑based sync.Pool usage can dramatically improve memory efficiency and throughput in high‑concurrency Go services. The design is simple, extensible, and well‑suited for backend systems that require fast, low‑latency WebSocket handling.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.