Backend Development 9 min read

Investigation of Go HTTP Client Connection Pool Not Reusing Connections

The investigation revealed that the Go http.Client was not reusing connections because response bodies were closed before being fully read, causing the underlying connections to be marked dead; fully reading the bodies restored pooling, eliminated DNS‑resolution timeouts and dramatically lowered query‑per‑second spikes.

37 Interactive Technology Team
37 Interactive Technology Team
37 Interactive Technology Team
Investigation of Go HTTP Client Connection Pool Not Reusing Connections

The service receives a massive number of DNS queries, peaking at about 2.4k QPS. During peak periods occasional DNS resolution timeouts (5 s) occur, while memory usage remains stable.

Investigation steps included checking whether the http.Client was instantiated as a singleton, verifying concurrency control, confirming keep‑alive settings, and reviewing connection‑recycling logic. The root cause was identified as closing response.Body before it was fully read, which prevents the underlying connection from being returned to the idle pool.

Example client configuration (global singleton): var ( doOption = &httpreq.HttpDoOption{DisableLog: true} triggerHandlerClient = &http.Client{ Timeout: 30 * time.Minute, Transport: &http.Transport{ Proxy: http.ProxyFromEnvironment, DialContext: knet.WrapTcpDialer(time.Second*5, time.Minute, nil), ForceAttemptHTTP2: true, MaxIdleConns: 256, MaxIdleConnsPerHost: 256, MaxConnsPerHost: 512, IdleConnTimeout: time.Minute, TLSHandshakeTimeout: 10 * time.Second, }, } )

Reproduction test code demonstrates the issue:

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "net/http/httptrace"
    "sync"
    "testing"
)

func TestAAA(t *testing.T) {
    c := make(chan struct{}, 5)
    var wg sync.WaitGroup
    for i := 0; i < 20; i++ {
        c <- struct{}{}
        go func() {
            wg.Add(1)
            defer wg.Done()
            HttpGet()
            <-c
        }()
    }
    wg.Wait()
    fmt.Println("getConnCount", getConnCount)
    fmt.Println("getDNSCount", getDNSCount)
}

var getConnCount = 0
var getConnCountLock = sync.Mutex{}
var getDNSCount = 0
var getDNSCountLock = sync.Mutex{}

var httpTrace = &httptrace.ClientTrace{
    ConnectDone: func(network, addr string, err error) {
        getConnCountLock.Lock()
        defer getConnCountLock.Unlock()
        getConnCount++
    },
    DNSDone: func(info httptrace.DNSDoneInfo) {
        getDNSCountLock.Lock()
        defer getDNSCountLock.Unlock()
        getDNSCount++
    },
}

func HttpGet() {
    req, _ := http.NewRequest("GET", "https://www.baidu.com", nil)
    req = req.WithContext(httptrace.WithClientTrace(req.Context(), httpTrace))
    resp, err := http.DefaultTransport.RoundTrip(req)
    if err != nil {
        fmt.Println(err)
        return
    }
    defer resp.Body.Close() // body not fully read
}

func HttpGet2() {
    req, _ := http.NewRequest("GET", "https://www.baidu.com", nil)
    req = req.WithContext(httptrace.WithClientTrace(req.Context(), httpTrace))
    resp, err := http.DefaultTransport.RoundTrip(req)
    if err != nil {
        fmt.Println(err)
        return
    }
    defer resp.Body.Close()
    _, _ = ioutil.ReadAll(resp.Body) // fully read
}

The only difference between HttpGet and HttpGet2 is the explicit ioutil.ReadAll call. HttpGet fails to reuse connections, while HttpGet2 succeeds.

Underlying principle: each HTTP connection runs separate read/write goroutines that communicate via channels. The response body is wrapped in a bodyEOFSignal which tracks whether the body was fully consumed. If Close is called before reaching EOF, the earlyCloseFn is invoked, setting the connection’s alive flag to false and preventing it from returning to the idle pool.

func (pc *persistConn) readLoop() {
    alive := true
    for alive {
        waitForBodyRead := make(chan bool, 2)
        body := &bodyEOFSignal{body: resp.Body, earlyCloseFn: func() error {
            waitForBodyRead <- false
            <-eofc
            return nil
        }, fn: func(err error) error {
            isEOF := err == io.EOF
            waitForBodyRead <- isEOF
            if isEOF {
                <-eofc
            } else if err != nil {
                if cerr := pc.canceled(); cerr != nil {
                    return cerr
                }
            }
            return err
        }}
        resp.Body = body
        // ... omitted ...
        select {
        case bodyEOF := <-waitForBodyRead:
            replaced := pc.t.replaceReqCanceler(rc.cancelKey, nil)
            alive = alive && bodyEOF && !pc.sawEOF && pc.wroteRequest() && replaced && tryPutIdleConn(trace)
            if bodyEOF {
                eofc <- struct{}{}
            }
        case <-rc.req.Cancel:
            alive = false
            pc.t.CancelRequest(rc.req)
        case <-rc.req.Context().Done():
            alive = false
            pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
        case <-pc.closech:
            alive = false
        }
    }
}

type bodyEOFSignal struct {
    body         io.ReadCloser
    mu           sync.Mutex
    closed       bool
    rerr         error
    fn           func(error) error
    earlyCloseFn func() error
}

func (es *bodyEOFSignal) Close() error {
    es.mu.Lock()
    defer es.mu.Unlock()
    if es.closed {
        return nil
    }
    es.closed = true
    if es.earlyCloseFn != nil && es.rerr != io.EOF {
        return es.earlyCloseFn()
    }
    err := es.body.Close()
    return es.condfn(err)
}

func (es *bodyEOFSignal) condfn(err error) error {
    if es.fn == nil {
        return err
    }
    err = es.fn(err)
    es.fn = nil
    return err
}

After deploying the fix (ensuring the response body is fully read or using proper tracing), DNS QPS dropped sharply, confirming the issue was resolved.

Recommendation: instrument both server and client with httptrace to collect detailed metrics and report them to a monitoring dashboard. Full‑stack tracing greatly accelerates root‑cause analysis compared to relying solely on DNS timeout symptoms.

PerformanceConnection PoolHTTPDNStracingGo
37 Interactive Technology Team
Written by

37 Interactive Technology Team

37 Interactive Technology Center

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.