Backend Development 14 min read

Why netpoll Beats Go’s net Library: 99.99% Goroutine Reduction & 40% CPU Savings

A three‑hour benchmark on an 8C‑16G Linux host compares the standard Go net client with the netpoll client under 60,000 concurrent connections, revealing a 27.6% drop in client memory, a 99.99% cut in goroutine count, a 29.5% reduction in host memory, and a 40.7% lower CPU usage while maintaining the same throughput.

Tech Musings

Mar 26, 2026

Why netpoll Beats Go’s net Library: 99.99% Goroutine Reduction & 40% CPU Savings

Test Environment

The benchmark creates 60,000 concurrent client connections with a 90‑second message interval, runs for three hours on an 8‑CPU, 16‑GB Linux VM, using Go 1.26 and netpoll v0.7.2.

Result Overview

Metric                net client   netpoll client   Improvement
Client Memory Alloc   580 MB       420 MB           ↓27.6%
Client Goroutines    120,000      13               ↓99.99%
Host Memory Used     2.37 GiB     1.67 GiB         ↓29.5%
Host CPU Usage       2.7%         1.6%             ↓40.7%
Throughput            ~667 msg/s  ~667 msg/s       -

Throughput is calculated as 60,000 connections ÷ 90 seconds = 667 msg/s; the low rate is intentional to simulate long‑connection keep‑alive rather than high‑throughput stress.

Detailed Data Analysis

Goroutine Count Difference

Net client spawns 120,000 goroutines, while netpoll client uses only 13 (0.01% of net).

Net Client Goroutine Composition

func (c *NetClient) Start() error { // main goroutine
    go c.runStats() // stats goroutine
    for batch := range totalBatches {
        for i := startIdx; i < endIdx; i++ {
            go func(clientID int) {
                conn, err := c.createConnection(clientID)
                // ...
                c.handleConnection(conn, clientID) // read & write goroutine per connection
            }(i)
        }
    }
}

func (c *NetClient) handleConnection(conn net.Conn, clientID int) {
    go c.writeLoop(conn, clientID) // 60,000 write goroutines
    go c.readLoop(conn, clientID)  // 60,000 read goroutines
}

Main goroutine: 1

Stats goroutine: 1

Write loop goroutine: 60,000

Read loop goroutine: 60,000

Total: 120,002

Netpoll Client Goroutine Composition

func (c *NetpollClient) Start() error { // main goroutine
    go c.runStats()      // stats goroutine
    go c.runDispatcher() // dispatcher goroutine
    for batch := range totalBatches {
        for i := startIdx; i < endIdx; i++ {
            go func(clientID int) {
                conn, err := c.createConnection(clientID)
                c.connMap.Store(clientID, &connEntry{conn: conn})
            }(i)
        }
    }
}

func (c *NetpollClient) runDispatcher() {
    // dispatcher goroutine (O(1) complexity)
    dispatch := func() {
        for id := start; id < end; id++ {
            c.pool.Go(func() { c.sendMessage(conn, clientID) })
        }
    }
}

Main goroutine: 1

Stats goroutine: 1

Dispatcher goroutine: 1

Goroutine pool workers: ~10 (active)

Total: ~13

The pool capacity is clientCount × 2 + 100 = 120,100 , but only about 13 goroutines are active during stable operation.

Root Causes of the Difference

Net client : uses blocking I/O; each connection needs dedicated read and write goroutines that stay alive, causing massive goroutine count and memory usage.

Netpoll client : employs an event‑driven model; reads are handled by the epoll loop, writes are dispatched through a small goroutine pool, drastically reducing goroutine count.

Memory Usage Difference

Client Type   Client Memory Alloc   Host Memory Used
net           580 MB                2.37 GiB
netpoll       420 MB                1.67 GiB
Improvement   ↓27.6%                ↓29.5%

Net client memory is dominated by 120,000 goroutine stacks (initial 2 KB each ≈ 240 MB) plus connection objects and sync.Pool buffers. Netpoll client holds only lightweight netpoll.Connection objects (~18 MB) and a tiny connEntry map (~2 MB), plus a small goroutine pool.

Runtime MemStats.Alloc shows 580 MB vs 420 MB; the gap mainly comes from the massive goroutine stack footprint in the net client.

Host Memory Difference

Node Exporter reports 2.37 GiB (net) vs 1.67 GiB (netpoll), a ~700 MB gap. Most of this extra usage stems from Go runtime overhead (goroutine stacks, GC metadata) and OS kernel structures for TCP connections.

CPU Usage Difference

Client Type   Host CPU Usage   Improvement
net           2.7%             -
netpoll       1.6%             ↓40.7%

Both clients achieve the same ~667 msg/s throughput; the CPU advantage of netpoll comes from far fewer goroutine scheduling decisions and a much smaller GC scanning workload.

Comprehensive Performance Analysis

Resource efficiency : netpoll uses ~0.00022 goroutine per connection vs 2.0 for net (≈ 9,091× improvement); memory per connection drops from 9.7 KB to 7.0 KB (1.4×); CPU per connection falls from 0.045% to 0.027% (1.7×).

Scalability : net client’s goroutine count, memory, and CPU grow linearly with connections; netpoll’s goroutine count stays O(1) while memory and CPU still grow linearly but with a much smaller slope.

Stability : net client suffered crashes at 30 k connections on a 2C‑2G host; on the test 8C‑16G host it remained stable. Netpoll showed no anomalies over three hours.

Technical Depth

Blocking I/O vs Event‑Driven

Blocking I/O (net):
Each goroutine blocks on Read()/Write() → OS creates a thread per goroutine → high resource usage.

Event‑Driven (netpoll):
A single epoll loop waits for events → ready connections are processed in callbacks → far fewer goroutines.

Blocking I/O is simple but consumes many OS threads.

Event‑driven model is more complex but dramatically reduces resource consumption.

Goroutine Scheduling

The Go scheduler uses an M:N model with work stealing, pre‑emptive scheduling, and system‑call handling. With many goroutines, the scheduler must maintain larger run queues, make more scheduling decisions, and handle more system calls, increasing CPU overhead.

Memory Management

Goroutine stacks start at 2 KB and grow up to 1 GB as needed. The GC’s three‑color marking scans every live stack; thus, 120,000 stacks cause a heavy GC load compared with only 13 stacks in netpoll.

Applicable Scenarios

Standard net (blocking) client

Connection count < 1,000 – low resource impact.

Rapid prototyping – simple code, high developer productivity.

Teams with strong Go standard‑library familiarity.

Cross‑platform needs – netpoll is Linux‑specific.

netpoll (event‑driven) client

Connection count > 10,000 – clear resource advantage.

Memory‑constrained environments such as containers or edge devices.

High‑performance services that must maximize resource utilization.

Long‑running services – lower GC pressure improves stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Go benchmark Goroutine netpoll net

Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.