Backend Development 14 min read

Investigation and Optimization of Unexpected AAAA DNS Requests in Go Applications

The article investigates why Go applications unexpectedly send AAAA DNS queries to a secondary nameserver, tracing the issue to the built‑in resolver’s handling of non‑recursive responses from a NetScaler proxy, and recommends using the cgo resolver, enabling recursion, or forcing IPv4 to eliminate the added latency.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Investigation and Optimization of Unexpected AAAA DNS Requests in Go Applications

Background: The system uses two nameservers configured in nameserver server1 nameserver server2 options timeout:1 attempts:1 . The fail‑over strategy expects all DNS queries to be handled by server1 while server2 is only used when server1 is unavailable.

Problem: In production, AAAA (IPv6) DNS queries are still sent to server2 . When the network path to server2 is faulty, HTTP requests experience an additional ~1 s latency because the AAAA query times out before the resolver falls back.

Investigation: The services are written in Go and use the standard net library. Typical usage is:

package main
import (
    "net"
    "net/http"
)
func main() {
    http.Get("https://internal.domain.name")
    net.Dial("tcp", "internal.domain.name:443")
}

Tracing the Go source shows that both http.Get and net.Dial eventually call func (d *Dialer) DialContext() , which invokes func (r *Resolver) lookupIP() . The resolver decides whether to use Go's built‑in resolver or the OS C library and also defines the priority of /etc/hosts .

On Debian the built‑in resolver is used, so the next step is func (r *Resolver) goLookupIPCNAMEOrder() . This function builds the query list qtypes := []dnsmessage.Type{dnsmessage.TypeA, dnsmessage.TypeAAAA} unless the network argument forces IPv4 or IPv6 only.

func (r *Resolver) goLookupIPCNAMEOrder(ctx context.Context, network, name string, order hostLookupOrder, conf *dnsConfig) (addrs []IPAddr, cname dnsmessage.Name, err error) {
    ...
    lane := make(chan result, 1)
    qtypes := []dnsmessage.Type{dnsmessage.TypeA, dnsmessage.TypeAAAA}
    switch ipVersion(network) {
    case '4':
        qtypes = []dnsmessage.Type{dnsmessage.TypeA}
    case '6':
        qtypes = []dnsmessage.Type{dnsmessage.TypeAAAA}
    }
    var queryFn func(fqdn string, qtype dnsmessage.Type)
    var responseFn func(fqdn string, qtype dnsmessage.Type) result
    if conf.singleRequest {
        queryFn = func(fqdn string, qtype dnsmessage.Type) {}
        responseFn = func(fqdn string, qtype dnsmessage.Type) result {
            dnsWaitGroup.Add(1)
            defer dnsWaitGroup.Done()
            p, server, err := r.tryOneName(ctx, conf, fqdn, qtype)
            return result{p, server, err}
        }
    } else {
        queryFn = func(fqdn string, qtype dnsmessage.Type) {
            dnsWaitGroup.Add(1)
            go func(qtype dnsmessage.Type) {
                p, server, err := r.tryOneName(ctx, conf, fqdn, qtype)
                lane <- result{p, server, err}
                dnsWaitGroup.Done()
            }(qtype)
        }
        responseFn = func(fqdn string, qtype dnsmessage.Type) result { return <-lane }
    }
    for _, fqdn := range conf.nameList(name) {
        for _, qtype := range qtypes {
            queryFn(fqdn, qtype)
        }
    }
    ...
    for _, qtype := range qtypes {
        result := responseFn(fqdn, qtype)
    }
    ...
}

The resolver eventually calls func (r *Resolver) tryOneName(...) , where the retry‑to‑next‑nameserver logic resides. The relevant fragment is:

func (r *Resolver) tryOneName(ctx context.Context, cfg *dnsConfig, name string, qtype dnsmessage.Type) (dnsmessage.Parser, string, error) {
    ...
    q := dnsmessage.Question{ Name: n, Type: qtype, Class: dnsmessage.ClassINET }
    for i := 0; i < cfg.attempts; i++ {
        for j := uint32(0); j < sLen; j++ {
            server := cfg.servers[(serverOffset+j)%sLen]
            p, h, err := r.exchange(ctx, server, q, cfg.timeout, cfg.useTCP, cfg.trustAD)
            ...
            if err := checkHeader(&p, h); err != nil {
                dnsErr := &DNSError{ Err: err.Error(), Name: name, Server: server }
                if err == errServerTemporarilyMisbehaving { dnsErr.IsTemporary = true }
                if err == errNoSuchHost {
                    dnsErr.IsNotFound = true
                    return p, server, dnsErr
                }
                lastErr = dnsErr
                continue
            }
        }
    }
    ...
}

Online debugging with Delve confirms the error originates from checkHeader :

dlv debug main.go
(dlv) break /usr/local/go/src/net/dnsclient_unix.go:279
(dlv) break /usr/local/go/src/net/dnsclient_unix.go:297
(dlv) continue
(dlv) print err
error(*errors.errorString) *{ s: "lame referral", }

The checkHeader function returns errLameReferral when all four conditions are met: successful response, non‑authoritative server, recursion not available, and empty answer section.

func checkHeader(p *dnsmessage.Parser, h dnsmessage.Header) error {
    ...
    // libresolv continues to the next server when it receives an invalid referral response.
    if h.RCode == dnsmessage.RCodeSuccess && !h.Authoritative && !h.RecursionAvailable && err == dnsmessage.ErrSectionDone {
        return errLameReferral
    }
    ...
}

In the environment, a NetScaler sits in front of the DNS server and does not enable recursion, triggering the above condition and causing the resolver to retry the AAAA query on server2 . The retry adds the observed 1 s latency.

Cause verification can be done with dig ; a warning like ;; WARNING: recursion requested but not available indicates the problem.

Optimization suggestions:

Compile Go programs with -tags netcgo to use the cgo‑based resolver, which follows glibc's logic and avoids the retry.

Enable recursion on the DNS proxy (NetScaler) if possible, after careful validation.

If IPv6 is not needed, force IPv4 in the application: net.Dial("tcp4", "internal.domain.name:443") net.Dial("udp4", "internal.domain.name:443")

For HTTP clients, set a custom DialContext that forces IPv4: package main import ( "context" "log" "net" "net/http" "time" ) func main() { dialer := &net.Dialer{Timeout: 30 * time.Second, KeepAlive: 30 * time.Second} transport := http.DefaultTransport.(*http.Transport).Clone() transport.DialContext = func(ctx context.Context, network, addr string) (net.Conn, error) { return dialer.DialContext(ctx, "tcp4", addr) } httpClient := &http.Client{Timeout: 30 * time.Second, Transport: transport} resp, err := httpClient.Get("https://internal.domain.name") if err != nil { log.Fatal(err) } log.Println(resp.StatusCode) }

Summary:

The Go net library provides two resolution paths: a built‑in resolver and the OS C library. Windows and macOS prefer the OS resolver; Linux distributions like Debian and CentOS use the built‑in resolver.

The built‑in resolver’s behavior differs from glibc’s, especially regarding retry logic when a non‑authoritative, non‑recursive response with an empty answer is received.

Set reasonable DNS timeout values to give fail‑over mechanisms enough time to react.

Recommended reading:

https://studygolang.com/topics/15021

https://pkg.go.dev/net – see the "Name Resolution" section.

debuggingGoNetworkDNSIPv4Failover
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.