Why Did My Kubernetes Pod Miss the Server? Uncovering DNS Search Domain Pitfalls
A Kubernetes pod appeared healthy but failed to receive heartbeats because its DNS search domain included a top‑level "HOST" entry, causing incorrect name resolution, and the article explains the root cause, analysis, and practical fixes such as using fully qualified names or adjusting ndots and dnsPolicy.
1. Fault Phenomenon
After deploying an agent service to a Kubernetes cluster, the pod status was Running but the server never received heartbeat signals; logs showed many TCP timeouts to a specific IP address.
2. Fault Investigation Process
Log analysis revealed i/o timeouts when connecting to a server IP. The service only contacts the server, whose domain name is set via an environment variable. The server was reachable from the host and from the node, so the issue was isolated to the pod.
Inside the pod, ping to the domain succeeded but resolved to an unexpected IP, indicating a DNS problem. Using nslookup showed a strange name with a trailing "HOST".
Inspecting /etc/resolv.conf revealed a search domain that included "HOST", causing the domain to be appended with this top‑level domain and resolved incorrectly.
Testing other domains that ended with "HOST" showed they all resolved to the same IP, confirming that "HOST" is a top‑level domain with wildcard resolution.
3. Fault Cause Analysis
Understanding how Kubernetes resolves service names:
Kubernetes DNS Resolution
Cluster‑internal name resolution
Within a namespace, a pod can reach service b via curl b. Across namespaces, the service name must include the namespace (e.g., curl b.devops). DNS queries follow the search list in /etc/resolv.conf and use the cluster DNS service (usually kube-dns or coredns) at its virtual IP.
cat /etc/resolv.conf
nameserver 10.68.0.2
search devops.svc.cluster.local. svc.cluster.local. cluster.local.The DNS server IP is a virtual ClusterIP that cannot be pinged but can be queried.
kubectl get svc -n kube-system | grep dns
kube-dns ClusterIP 10.68.0.2 ...When a pod queries curl b, the name is appended with each search domain until it resolves, which is more efficient than curl b.devops that requires an extra lookup.
Cluster‑external name resolution
External domains also go through the search list. Capturing DNS packets shows multiple lookups for baidu.com (e.g., baidu.com.devops.svc.cluster.local., baidu.com.svc.cluster.local., baidu.com.cluster.local., then baidu.com.), adding unnecessary latency.
The ndots option in /etc/resolv.conf controls this behavior. With ndots:5, names with fewer than five dots are treated as non‑absolute and go through the search list; names with five or more dots are queried as absolute.
cat /etc/resolv.conf
options ndots:5Examples demonstrate how names with fewer than five dots trigger multiple searches, while names with five or more dots are queried directly.
Optimization 1: Use Fully Qualified Domain Names
Appending a trailing dot (e.g., a.b.c.com.) forces an absolute lookup, eliminating extra search‑domain queries.
nslookup a.b.c.com.Optimization 2: Adjust ndots per Deployment
For many workloads, using the default ndots:5 is reasonable, but you can customize it per deployment to better suit your services.
spec:
dnsConfig:
options:
- name: timeout
value: "2"
- name: ndots
value: "2"
- name: single-request-reopen
dnsPolicy: ClusterFirstKubernetes provides four DNS policies:
None – empty DNS configuration, used with custom dnsConfig.
Default – lets kubelet decide, typically using the node’s /etc/resolv.conf.
ClusterFirst – pods use the cluster DNS service first, falling back to the node DNS if needed.
ClusterFirstWithHostNet – for host‑networked pods, still use the cluster DNS.
4. Conclusion
Setting dnsPolicy in the deployment and lowering ndots from the default 5 to 2 forces the pod to resolve the server’s domain as an absolute name, avoiding the erroneous search‑domain lookup and restoring heartbeat communication.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
