Why Did My Kubernetes Pod Miss the Server? Uncovering DNS Search Domain Pitfalls
A Kubernetes pod appeared healthy but failed to receive heartbeats because its DNS search domain included a top‑level "HOST" entry, causing incorrect name resolution, and the article explains the root cause, analysis, and practical fixes such as using fully qualified names or adjusting ndots and dnsPolicy.
1. Fault Phenomenon
After deploying an agent service to a Kubernetes cluster, the pod status was Running but the server never received heartbeat signals; logs showed many TCP timeouts to a specific IP address.
2. Fault Investigation Process
Log analysis revealed i/o timeouts when connecting to a server IP. The service only contacts the server, whose domain name is set via an environment variable. The server was reachable from the host and from the node, so the issue was isolated to the pod.
Inside the pod, ping to the domain succeeded but resolved to an unexpected IP, indicating a DNS problem. Using
nslookupshowed a strange name with a trailing "HOST".
Inspecting
/etc/resolv.confrevealed a search domain that included "HOST", causing the domain to be appended with this top‑level domain and resolved incorrectly.
Testing other domains that ended with "HOST" showed they all resolved to the same IP, confirming that "HOST" is a top‑level domain with wildcard resolution.
3. Fault Cause Analysis
Understanding how Kubernetes resolves service names:
Kubernetes DNS Resolution
Cluster‑internal name resolution
Within a namespace, a pod can reach service
bvia
curl b. Across namespaces, the service name must include the namespace (e.g.,
curl b.devops). DNS queries follow the
searchlist in
/etc/resolv.confand use the cluster DNS service (usually
kube-dnsor
coredns) at its virtual IP.
<code>cat /etc/resolv.conf
nameserver 10.68.0.2
search devops.svc.cluster.local. svc.cluster.local. cluster.local.</code>The DNS server IP is a virtual ClusterIP that cannot be pinged but can be queried.
<code>kubectl get svc -n kube-system | grep dns
kube-dns ClusterIP 10.68.0.2 ...</code>When a pod queries
curl b, the name is appended with each search domain until it resolves, which is more efficient than
curl b.devopsthat requires an extra lookup.
Cluster‑external name resolution
External domains also go through the
searchlist. Capturing DNS packets shows multiple lookups for
baidu.com(e.g.,
baidu.com.devops.svc.cluster.local.,
baidu.com.svc.cluster.local.,
baidu.com.cluster.local., then
baidu.com.), adding unnecessary latency.
The
ndotsoption in
/etc/resolv.confcontrols this behavior. With
ndots:5, names with fewer than five dots are treated as non‑absolute and go through the search list; names with five or more dots are queried as absolute.
<code>cat /etc/resolv.conf
options ndots:5</code>Examples demonstrate how names with fewer than five dots trigger multiple searches, while names with five or more dots are queried directly.
Optimization 1: Use Fully Qualified Domain Names
Appending a trailing dot (e.g.,
a.b.c.com.) forces an absolute lookup, eliminating extra search‑domain queries.
<code>nslookup a.b.c.com.</code>Optimization 2: Adjust ndots per Deployment
For many workloads, using the default
ndots:5is reasonable, but you can customize it per deployment to better suit your services.
<code>spec:
dnsConfig:
options:
- name: timeout
value: "2"
- name: ndots
value: "2"
- name: single-request-reopen
dnsPolicy: ClusterFirst
</code>Kubernetes provides four DNS policies:
None – empty DNS configuration, used with custom
dnsConfig.
Default – lets kubelet decide, typically using the node’s
/etc/resolv.conf.
ClusterFirst – pods use the cluster DNS service first, falling back to the node DNS if needed.
ClusterFirstWithHostNet – for host‑networked pods, still use the cluster DNS.
4. Conclusion
Setting
dnsPolicyin the deployment and lowering
ndotsfrom the default 5 to 2 forces the pod to resolve the server’s domain as an absolute name, avoiding the erroneous search‑domain lookup and restoring heartbeat communication.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.