Backend Development 17 min read

Why Netpoll Beats Go’s net Library for 60k Connections: A Deep Dive

An extensive benchmark compares Go’s standard net client with the event‑driven cloudwego/netpoll client under 60,000 concurrent connections, revealing how goroutine explosion, memory usage, and scheduler overhead differ, and demonstrates how a single scheduler plus a bounded goroutine pool dramatically reduces resource consumption.

Tech Musings

Mar 26, 2026

Why Netpoll Beats Go’s net Library for 60k Connections: A Deep Dive

Test Background and Objectives

Background

On a 2 CPU / 2 GB cloud VM a Go load‑testing client built with the standard net package uses a read/write split goroutine model. When the number of concurrent connections stays around 30 000 the VM frequently crashes.

Hypothesis

Too many goroutines : each connection creates a dedicated read and write goroutine, resulting in 60 000 connections → 120 000 goroutines.

High memory usage : goroutine stacks (initial 2 KB) plus connection objects and buffers grow linearly with the connection count.

Scheduler overload : the Go scheduler is saturated by the massive goroutine count.

To validate the hypothesis the epoll‑based cloudwego/netpoll library is compared against the standard net implementation.

Goals

Quantify differences in memory consumption, CPU load and goroutine count between the two approaches.

Analyse the root causes of the differences and explain the fundamental I/O model distinctions.

Provide concrete guidance for scenarios that require tens of thousands of concurrent connections.

Test Environment

Hardware

8 CPU, 16 GB RAM

1 Gbps internal network

Software

Linux

Go version: 1.26

netpoll version: v0.7.2

gopool:

bytedance/gopkg/util/gopool

Test Parameters

Client connections: 60 000

Message interval: 90 seconds (simulates long‑connection keep‑alive, not throughput stress)

Test duration: 3 hours

Message size: 128–512 bytes random data

Implementation Comparison

Standard net Client

Core Architecture

type NetClient struct {
    serverAddr   string
    clientCount  int
    interval     time.Duration
    logger       *slog.Logger
    shutdownChan chan struct{}
    sentCount    atomic.Int64
    errorCount   atomic.Int64
}

Connection Handling

func (c *NetClient) handleConnection(conn net.Conn, clientID int) {
    var closeOnce sync.Once
    closeConn := func() {
        closeOnce.Do(func() { conn.Close(); common.ActiveConnections.Dec(); c.logger.Debug("Connection closed", "id", clientID) })
    }
    go func() { defer closeConn(); c.writeLoop(conn, clientID) }()
    go func() { defer closeConn(); c.readLoop(conn, clientID) }()
}

Read/Write Loops

// writeLoop sends messages at a fixed interval
func (c *NetClient) writeLoop(conn net.Conn, clientID int) {
    ticker := time.NewTicker(c.interval)
    defer ticker.Stop()
    if err := c.sendMessage(conn, clientID); err != nil { return }
    for {
        select {
        case <-c.shutdownChan:
            return
        case <-ticker.C:
            if err := c.sendMessage(conn, clientID); err != nil { return }
        }
    }
}

// readLoop continuously reads responses
func (c *NetClient) readLoop(conn net.Conn, clientID int) {
    recvPtr := recvBufPool.Get().(*[]byte)
    defer recvBufPool.Put(recvPtr)
    for {
        conn.SetReadDeadline(time.Now().Add(c.interval + 60*time.Second))
        _, err := io.ReadFull(conn, (*recvPtr)[:4])
        if err != nil { return }
        length := binary.BigEndian.Uint32((*recvPtr)[:4])
        if length > 8192 { return }
        conn.SetReadDeadline(time.Now().Add(30 * time.Second))
        _, err = io.ReadFull(conn, (*recvPtr)[:length])
        if err != nil { return }
        common.MessagesReceived.Inc()
        c.sentCount.Add(1)
    }
}

Memory Pool Optimisation

var sendBufPool = sync.Pool{ New: func() any { buf := make([]byte, 516); return &buf } }
var recvBufPool = sync.Pool{ New: func() any { buf := make([]byte, 8192); return &buf } }

Resource Consumption (60 k connections)

Read/Write goroutine: 60 000 × 2 = 120 000

Goroutine stack memory: ~240 MB (120 000 × 2 KB)

Connection objects: ~30 MB (60 000 × ≈ 500 B)

Receive buffers (via sync.Pool): far below the theoretical 480 MB peak

netpoll Client

Core Architecture

type NetpollClient struct {
    serverAddr   string
    clientCount  int
    interval     time.Duration
    logger       *slog.Logger
    shutdownChan chan struct{}
    shutdownOnce sync.Once
    sentCount    atomic.Int64
    errorCount   atomic.Int64
    connMap      sync.Map // clientID → *connEntry
    pool         gopool.Pool // bounded goroutine pool
}

type connEntry struct {
    conn    netpoll.Connection
    inFlight atomic.Bool // CAS flag for send permission
}

Connection Creation and Registration

func (c *NetpollClient) createConnection(clientID int) (netpoll.Connection, error) {
    dialer := netpoll.NewDialer()
    conn, err := dialer.DialConnection("tcp", c.serverAddr, 5*time.Second)
    if err != nil { return nil, fmt.Errorf("dial failed: %w", err) }
    conn.SetOnRequest(func(ctx context.Context, conn netpoll.Connection) error { return c.handleRequest(ctx, conn, clientID) })
    conn.AddCloseCallback(func(_ netpoll.Connection) error { c.connMap.Delete(clientID); common.ActiveConnections.Dec(); return nil })
    common.TotalConnections.Inc()
    common.ActiveConnections.Inc()
    return conn, nil
}

Single Scheduler + Goroutine Pool

The client runs a single dispatcher goroutine (O(1) complexity) that splits the connection ID space into batches and schedules each batch at a fixed interval. Tasks are submitted to a bounded gopool instead of spawning a goroutine per connection.

func (c *NetpollClient) runDispatcher() {
    numBatches := max(1, int(c.interval/(50*time.Millisecond)))
    batchSize := max(1, (c.clientCount+numBatches-1)/numBatches)
    dispatchInterval := c.interval / time.Duration(numBatches)
    ticker := time.NewTicker(dispatchInterval)
    defer ticker.Stop()
    batchIdx := 0
    dispatch := func() {
        start := batchIdx * batchSize
        end := min(start+batchSize, c.clientCount)
        for id := start; id < end; id++ {
            val, ok := c.connMap.Load(id)
            if !ok { continue }
            entry := val.(*connEntry)
            if !entry.conn.IsActive() { continue }
            if !entry.inFlight.CompareAndSwap(false, true) { continue }
            c.pool.Go(func() {
                defer entry.inFlight.Store(false)
                if entry.conn.IsActive() { c.sendMessage(entry.conn, id) }
            })
        }
        batchIdx = (batchIdx + 1) % numBatches
    }
    dispatch()
    for {
        select {
        case <-c.shutdownChan: return
        case <-ticker.C: dispatch()
        }
    }
}

Event‑Driven Read Handling

func (c *NetpollClient) handleRequest(ctx context.Context, conn netpoll.Connection, clientID int) error {
    reader := conn.Reader()
    defer reader.Release()
    buf, err := reader.Next(4)
    if err != nil { return err }
    length := binary.BigEndian.Uint32(buf)
    if length > 8192 { return fmt.Errorf("response too large") }
    if length > 0 {
        if _, err = reader.Next(int(length)); err != nil { return err }
    }
    common.MessagesReceived.Inc()
    c.sentCount.Add(1)
    return nil
}

Resource Consumption (60 k connections)

Scheduler goroutine: 1

Statistics goroutine: 1

Goroutine‑pool workers: up to clientCount*2+100 = 120,100 (actual active far lower)

Read goroutine: 0 (event‑driven)

Connection objects: ~18 MB (60 000 × ≈ 300 B)

Summary of Differences

I/O Model Comparison

Standard net : blocking I/O, each connection has two dedicated goroutines (read & write).

netpoll : non‑blocking I/O with epoll, reads are handled via callbacks, writes are driven by a single scheduler plus a bounded goroutine pool.

Resource Management Comparison

Goroutine count: O(n) for net vs O(1) for netpoll.

Memory usage: higher for net due to many stacks; lower for netpoll.

GC pressure: large for net, small for netpoll.

Scheduler overhead: high for net, low for netpoll.

Code Complexity Comparison

Lines of code: slightly fewer for net; netpoll adds event‑driven plumbing.

Understanding difficulty: easy for net; moderate for netpoll (requires knowledge of epoll and CAS).

Maintenance cost: low for net; moderate for netpoll.

Test Methodology

Procedure

Deploy a TCP server on 192.168.37.37.

Deploy the standard net client on 192.168.37.38.

Deploy the netpoll client on 192.168.37.39.

Run Prometheus, Grafana and Node Exporter for monitoring.

Start both clients simultaneously, each creating 60 000 connections.

Collect metrics for 3 hours.

Monitored Metrics

Client memory usage: runtime.MemStats.Alloc Goroutine count: runtime.NumGoroutine() Host memory available: node_memory_MemAvailable_bytes Host CPU usage: node_cpu_seconds_total Message throughput: requests‑responses per second

Error rate: proportion of failed send/receive operations

Data Collection

Application‑level metrics exposed via the Prometheus client library over HTTP.

Host‑level metrics collected by Node Exporter.

Structured logs emitted with slog.

Go benchmark Goroutine netpoll net

Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Test Background and Objectives

Background

Hypothesis

Goals

Test Environment

Hardware

Software

Test Parameters

Implementation Comparison

Standard net Client

Core Architecture

Connection Handling

Read/Write Loops

Memory Pool Optimisation

Resource Consumption (60 k connections)

netpoll Client

Core Architecture

Connection Creation and Registration

Single Scheduler + Goroutine Pool

Event‑Driven Read Handling

Resource Consumption (60 k connections)

Summary of Differences

I/O Model Comparison

Resource Management Comparison

Code Complexity Comparison

Test Methodology

Procedure

Monitored Metrics

Data Collection

Tech Musings

How this landed with the community

Was this worth your time?

0 Comments

Resource Consumption (60 k connections)

Resource Consumption (60 k connections)