Why Your Kubernetes Pod Can't Reach the Server: DNS Search Domain Pitfalls and Fixes
An agent service running in a Kubernetes pod appeared healthy but failed to receive heartbeats due to DNS resolution errors caused by an unintended 'HOST' search domain, leading to incorrect IP resolution; the article details the investigation, explains Kubernetes DNS mechanics, and shows how adjusting ndots or using fully qualified names resolves the issue.
1. Fault Phenomenon
We deployed an agent service to a Kubernetes cluster; the pod status is Running, but the server never receives heartbeat signals. Logs show many "tcp timeout" messages when trying to connect to a specific IP address.
2. Fault Investigation Process
Log analysis revealed numerous I/O timeout errors when connecting to the IP. The service only contacts the server, whose domain name is set via an environment variable. The server is reachable from the host and the node, so the issue is not the server itself.
Testing from inside the pod shows the server cannot be reached, even though ping resolves a domain. The resolved IP is not the server's external IP, indicating a DNS resolution problem. Using nslookup (after installing dnsutils or bind-utils) shows an odd name with a trailing HOST suffix.
Inspecting /etc/resolv.conf reveals a search domain that includes HOST , causing every DNS query to append this suffix.
Any domain ending with HOST resolves to an unexpected IP because HOST is a top‑level domain that performs wildcard resolution.
3. Fault Cause Analysis
Understanding how Kubernetes resolves service names is essential. Inside a pod, DNS queries are sent to the cluster’s kube-dns (or coredns) service IP (e.g., 10.68.0.2) as defined in /etc/resolv.conf. The file typically contains:
nameserver 10.68.0.2
search devops.svc.cluster.local. svc.cluster.local. cluster.local.
options ndots:5For intra‑namespace service calls, a simple name like b is expanded using the search list, eventually forming b.devops.svc.cluster.local. For external domains, the same search list is applied unless the ndots threshold is met.
When a domain has fewer than five dots, the resolver appends each search suffix in turn, generating multiple DNS queries. This was demonstrated by capturing packets with tcpdump for baidu.com and for a short‑dot domain a.b.c.d.com. The captures show three DNS lookups for the short‑dot case, while a long‑dot domain (≥5 dots) is queried directly as an absolute name.
// Example of /etc/resolv.conf
nameserver 10.68.0.2
search devops.svc.cluster.local svc.cluster.local cluster.local
options ndots:5Two optimization strategies are presented:
Optimization 1: Use Fully Qualified Domain Names
Appending a trailing dot (e.g., a.b.c.com.) forces the resolver to treat the name as absolute, bypassing the search list and eliminating extra DNS queries.
nslookup a.b.c.com.Optimization 2: Adjust ndots Value
Changing ndots from the default 5 to a lower value (e.g., 2) reduces the number of times the search list is applied for short domain names. This can be done per‑deployment via dnsConfig:
spec:
containers:
- name: srv-inner-proxy
image: xxx/devops/srv-inner-proxy
...
dnsConfig:
options:
- name: ndots
value: "2"
dnsPolicy: ClusterFirstKubernetes supports four DNS policies for pods:
None – no DNS configuration (used with custom dnsConfig).
Default – lets kubelet decide, typically using the node’s /etc/resolv.conf.
ClusterFirst – pods use the cluster’s DNS service first, falling back to the node’s DNS if needed.
ClusterFirstWithHostNet – for host‑networked pods, still use the cluster DNS.
4. Conclusion
By setting dnsPolicy in the deployment and lowering ndots to 2, the pod’s DNS resolution bypasses the problematic HOST search domain and correctly resolves the server’s IP address. The case highlights the importance of understanding Kubernetes DNS internals when troubleshooting connectivity issues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
