Why Netpoll Beats Go’s net Library for 60k Connections: A Deep Dive
An extensive benchmark compares Go’s standard net client with the event‑driven cloudwego/netpoll client under 60,000 concurrent connections, revealing how goroutine explosion, memory usage, and scheduler overhead differ, and demonstrates how a single scheduler plus a bounded goroutine pool dramatically reduces resource consumption.
Test Background and Objectives
Background
On a 2 CPU / 2 GB cloud VM a Go load‑testing client built with the standard net package uses a read/write split goroutine model. When the number of concurrent connections stays around 30 000 the VM frequently crashes.
Hypothesis
Too many goroutines : each connection creates a dedicated read and write goroutine, resulting in 60 000 connections → 120 000 goroutines.
High memory usage : goroutine stacks (initial 2 KB) plus connection objects and buffers grow linearly with the connection count.
Scheduler overload : the Go scheduler is saturated by the massive goroutine count.
To validate the hypothesis the epoll‑based cloudwego/netpoll library is compared against the standard net implementation.
Goals
Quantify differences in memory consumption, CPU load and goroutine count between the two approaches.
Analyse the root causes of the differences and explain the fundamental I/O model distinctions.
Provide concrete guidance for scenarios that require tens of thousands of concurrent connections.
Test Environment
Hardware
8 CPU, 16 GB RAM
1 Gbps internal network
Software
Linux
Go version: 1.26
netpoll version: v0.7.2
gopool:
bytedance/gopkg/util/gopoolTest Parameters
Client connections: 60 000
Message interval: 90 seconds (simulates long‑connection keep‑alive, not throughput stress)
Test duration: 3 hours
Message size: 128–512 bytes random data
Implementation Comparison
Standard net Client
Core Architecture
type NetClient struct {
serverAddr string
clientCount int
interval time.Duration
logger *slog.Logger
shutdownChan chan struct{}
sentCount atomic.Int64
errorCount atomic.Int64
}Connection Handling
func (c *NetClient) handleConnection(conn net.Conn, clientID int) {
var closeOnce sync.Once
closeConn := func() {
closeOnce.Do(func() { conn.Close(); common.ActiveConnections.Dec(); c.logger.Debug("Connection closed", "id", clientID) })
}
go func() { defer closeConn(); c.writeLoop(conn, clientID) }()
go func() { defer closeConn(); c.readLoop(conn, clientID) }()
}Read/Write Loops
// writeLoop sends messages at a fixed interval
func (c *NetClient) writeLoop(conn net.Conn, clientID int) {
ticker := time.NewTicker(c.interval)
defer ticker.Stop()
if err := c.sendMessage(conn, clientID); err != nil { return }
for {
select {
case <-c.shutdownChan:
return
case <-ticker.C:
if err := c.sendMessage(conn, clientID); err != nil { return }
}
}
}
// readLoop continuously reads responses
func (c *NetClient) readLoop(conn net.Conn, clientID int) {
recvPtr := recvBufPool.Get().(*[]byte)
defer recvBufPool.Put(recvPtr)
for {
conn.SetReadDeadline(time.Now().Add(c.interval + 60*time.Second))
_, err := io.ReadFull(conn, (*recvPtr)[:4])
if err != nil { return }
length := binary.BigEndian.Uint32((*recvPtr)[:4])
if length > 8192 { return }
conn.SetReadDeadline(time.Now().Add(30 * time.Second))
_, err = io.ReadFull(conn, (*recvPtr)[:length])
if err != nil { return }
common.MessagesReceived.Inc()
c.sentCount.Add(1)
}
}Memory Pool Optimisation
var sendBufPool = sync.Pool{ New: func() any { buf := make([]byte, 516); return &buf } }
var recvBufPool = sync.Pool{ New: func() any { buf := make([]byte, 8192); return &buf } }Resource Consumption (60 k connections)
Read/Write goroutine: 60 000 × 2 = 120 000
Goroutine stack memory: ~240 MB (120 000 × 2 KB)
Connection objects: ~30 MB (60 000 × ≈ 500 B)
Receive buffers (via sync.Pool): far below the theoretical 480 MB peak
netpoll Client
Core Architecture
type NetpollClient struct {
serverAddr string
clientCount int
interval time.Duration
logger *slog.Logger
shutdownChan chan struct{}
shutdownOnce sync.Once
sentCount atomic.Int64
errorCount atomic.Int64
connMap sync.Map // clientID → *connEntry
pool gopool.Pool // bounded goroutine pool
}
type connEntry struct {
conn netpoll.Connection
inFlight atomic.Bool // CAS flag for send permission
}Connection Creation and Registration
func (c *NetpollClient) createConnection(clientID int) (netpoll.Connection, error) {
dialer := netpoll.NewDialer()
conn, err := dialer.DialConnection("tcp", c.serverAddr, 5*time.Second)
if err != nil { return nil, fmt.Errorf("dial failed: %w", err) }
conn.SetOnRequest(func(ctx context.Context, conn netpoll.Connection) error { return c.handleRequest(ctx, conn, clientID) })
conn.AddCloseCallback(func(_ netpoll.Connection) error { c.connMap.Delete(clientID); common.ActiveConnections.Dec(); return nil })
common.TotalConnections.Inc()
common.ActiveConnections.Inc()
return conn, nil
}Single Scheduler + Goroutine Pool
The client runs a single dispatcher goroutine (O(1) complexity) that splits the connection ID space into batches and schedules each batch at a fixed interval. Tasks are submitted to a bounded gopool instead of spawning a goroutine per connection.
func (c *NetpollClient) runDispatcher() {
numBatches := max(1, int(c.interval/(50*time.Millisecond)))
batchSize := max(1, (c.clientCount+numBatches-1)/numBatches)
dispatchInterval := c.interval / time.Duration(numBatches)
ticker := time.NewTicker(dispatchInterval)
defer ticker.Stop()
batchIdx := 0
dispatch := func() {
start := batchIdx * batchSize
end := min(start+batchSize, c.clientCount)
for id := start; id < end; id++ {
val, ok := c.connMap.Load(id)
if !ok { continue }
entry := val.(*connEntry)
if !entry.conn.IsActive() { continue }
if !entry.inFlight.CompareAndSwap(false, true) { continue }
c.pool.Go(func() {
defer entry.inFlight.Store(false)
if entry.conn.IsActive() { c.sendMessage(entry.conn, id) }
})
}
batchIdx = (batchIdx + 1) % numBatches
}
dispatch()
for {
select {
case <-c.shutdownChan: return
case <-ticker.C: dispatch()
}
}
}Event‑Driven Read Handling
func (c *NetpollClient) handleRequest(ctx context.Context, conn netpoll.Connection, clientID int) error {
reader := conn.Reader()
defer reader.Release()
buf, err := reader.Next(4)
if err != nil { return err }
length := binary.BigEndian.Uint32(buf)
if length > 8192 { return fmt.Errorf("response too large") }
if length > 0 {
if _, err = reader.Next(int(length)); err != nil { return err }
}
common.MessagesReceived.Inc()
c.sentCount.Add(1)
return nil
}Resource Consumption (60 k connections)
Scheduler goroutine: 1
Statistics goroutine: 1
Goroutine‑pool workers: up to clientCount*2+100 = 120,100 (actual active far lower)
Read goroutine: 0 (event‑driven)
Connection objects: ~18 MB (60 000 × ≈ 300 B)
Summary of Differences
I/O Model Comparison
Standard net : blocking I/O, each connection has two dedicated goroutines (read & write).
netpoll : non‑blocking I/O with epoll, reads are handled via callbacks, writes are driven by a single scheduler plus a bounded goroutine pool.
Resource Management Comparison
Goroutine count: O(n) for net vs O(1) for netpoll.
Memory usage: higher for net due to many stacks; lower for netpoll.
GC pressure: large for net, small for netpoll.
Scheduler overhead: high for net, low for netpoll.
Code Complexity Comparison
Lines of code: slightly fewer for net; netpoll adds event‑driven plumbing.
Understanding difficulty: easy for net; moderate for netpoll (requires knowledge of epoll and CAS).
Maintenance cost: low for net; moderate for netpoll.
Test Methodology
Procedure
Deploy a TCP server on 192.168.37.37.
Deploy the standard net client on 192.168.37.38.
Deploy the netpoll client on 192.168.37.39.
Run Prometheus, Grafana and Node Exporter for monitoring.
Start both clients simultaneously, each creating 60 000 connections.
Collect metrics for 3 hours.
Monitored Metrics
Client memory usage: runtime.MemStats.Alloc Goroutine count: runtime.NumGoroutine() Host memory available: node_memory_MemAvailable_bytes Host CPU usage: node_cpu_seconds_total Message throughput: requests‑responses per second
Error rate: proportion of failed send/receive operations
Data Collection
Application‑level metrics exposed via the Prometheus client library over HTTP.
Host‑level metrics collected by Node Exporter.
Structured logs emitted with slog.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
