Why netpoll Beats Go’s net Library: 99.99% Goroutine Reduction & 40% CPU Savings
A three‑hour benchmark on an 8C‑16G Linux host compares the standard Go net client with the netpoll client under 60,000 concurrent connections, revealing a 27.6% drop in client memory, a 99.99% cut in goroutine count, a 29.5% reduction in host memory, and a 40.7% lower CPU usage while maintaining the same throughput.
Test Environment
The benchmark creates 60,000 concurrent client connections with a 90‑second message interval, runs for three hours on an 8‑CPU, 16‑GB Linux VM, using Go 1.26 and netpoll v0.7.2.
Result Overview
Metric net client netpoll client Improvement
Client Memory Alloc 580 MB 420 MB ↓27.6%
Client Goroutines 120,000 13 ↓99.99%
Host Memory Used 2.37 GiB 1.67 GiB ↓29.5%
Host CPU Usage 2.7% 1.6% ↓40.7%
Throughput ~667 msg/s ~667 msg/s -Throughput is calculated as 60,000 connections ÷ 90 seconds = 667 msg/s; the low rate is intentional to simulate long‑connection keep‑alive rather than high‑throughput stress.
Detailed Data Analysis
Goroutine Count Difference
Net client spawns 120,000 goroutines, while netpoll client uses only 13 (0.01% of net).
Net Client Goroutine Composition
func (c *NetClient) Start() error { // main goroutine
go c.runStats() // stats goroutine
for batch := range totalBatches {
for i := startIdx; i < endIdx; i++ {
go func(clientID int) {
conn, err := c.createConnection(clientID)
// ...
c.handleConnection(conn, clientID) // read & write goroutine per connection
}(i)
}
}
}
func (c *NetClient) handleConnection(conn net.Conn, clientID int) {
go c.writeLoop(conn, clientID) // 60,000 write goroutines
go c.readLoop(conn, clientID) // 60,000 read goroutines
}Main goroutine: 1
Stats goroutine: 1
Write loop goroutine: 60,000
Read loop goroutine: 60,000
Total: 120,002
Netpoll Client Goroutine Composition
func (c *NetpollClient) Start() error { // main goroutine
go c.runStats() // stats goroutine
go c.runDispatcher() // dispatcher goroutine
for batch := range totalBatches {
for i := startIdx; i < endIdx; i++ {
go func(clientID int) {
conn, err := c.createConnection(clientID)
c.connMap.Store(clientID, &connEntry{conn: conn})
}(i)
}
}
}
func (c *NetpollClient) runDispatcher() {
// dispatcher goroutine (O(1) complexity)
dispatch := func() {
for id := start; id < end; id++ {
c.pool.Go(func() { c.sendMessage(conn, clientID) })
}
}
}Main goroutine: 1
Stats goroutine: 1
Dispatcher goroutine: 1
Goroutine pool workers: ~10 (active)
Total: ~13
The pool capacity is clientCount × 2 + 100 = 120,100 , but only about 13 goroutines are active during stable operation.
Root Causes of the Difference
Net client : uses blocking I/O; each connection needs dedicated read and write goroutines that stay alive, causing massive goroutine count and memory usage.
Netpoll client : employs an event‑driven model; reads are handled by the epoll loop, writes are dispatched through a small goroutine pool, drastically reducing goroutine count.
Memory Usage Difference
Client Type Client Memory Alloc Host Memory Used
net 580 MB 2.37 GiB
netpoll 420 MB 1.67 GiB
Improvement ↓27.6% ↓29.5%Net client memory is dominated by 120,000 goroutine stacks (initial 2 KB each ≈ 240 MB) plus connection objects and sync.Pool buffers. Netpoll client holds only lightweight netpoll.Connection objects (~18 MB) and a tiny connEntry map (~2 MB), plus a small goroutine pool.
Runtime MemStats.Alloc shows 580 MB vs 420 MB; the gap mainly comes from the massive goroutine stack footprint in the net client.
Host Memory Difference
Node Exporter reports 2.37 GiB (net) vs 1.67 GiB (netpoll), a ~700 MB gap. Most of this extra usage stems from Go runtime overhead (goroutine stacks, GC metadata) and OS kernel structures for TCP connections.
CPU Usage Difference
Client Type Host CPU Usage Improvement
net 2.7% -
netpoll 1.6% ↓40.7%Both clients achieve the same ~667 msg/s throughput; the CPU advantage of netpoll comes from far fewer goroutine scheduling decisions and a much smaller GC scanning workload.
Comprehensive Performance Analysis
Resource efficiency : netpoll uses ~0.00022 goroutine per connection vs 2.0 for net (≈ 9,091× improvement); memory per connection drops from 9.7 KB to 7.0 KB (1.4×); CPU per connection falls from 0.045% to 0.027% (1.7×).
Scalability : net client’s goroutine count, memory, and CPU grow linearly with connections; netpoll’s goroutine count stays O(1) while memory and CPU still grow linearly but with a much smaller slope.
Stability : net client suffered crashes at 30 k connections on a 2C‑2G host; on the test 8C‑16G host it remained stable. Netpoll showed no anomalies over three hours.
Technical Depth
Blocking I/O vs Event‑Driven
Blocking I/O (net):
Each goroutine blocks on Read()/Write() → OS creates a thread per goroutine → high resource usage.
Event‑Driven (netpoll):
A single epoll loop waits for events → ready connections are processed in callbacks → far fewer goroutines.Blocking I/O is simple but consumes many OS threads.
Event‑driven model is more complex but dramatically reduces resource consumption.
Goroutine Scheduling
The Go scheduler uses an M:N model with work stealing, pre‑emptive scheduling, and system‑call handling. With many goroutines, the scheduler must maintain larger run queues, make more scheduling decisions, and handle more system calls, increasing CPU overhead.
Memory Management
Goroutine stacks start at 2 KB and grow up to 1 GB as needed. The GC’s three‑color marking scans every live stack; thus, 120,000 stacks cause a heavy GC load compared with only 13 stacks in netpoll.
Applicable Scenarios
Standard net (blocking) client
Connection count < 1,000 – low resource impact.
Rapid prototyping – simple code, high developer productivity.
Teams with strong Go standard‑library familiarity.
Cross‑platform needs – netpoll is Linux‑specific.
netpoll (event‑driven) client
Connection count > 10,000 – clear resource advantage.
Memory‑constrained environments such as containers or edge devices.
High‑performance services that must maximize resource utilization.
Long‑running services – lower GC pressure improves stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
