Why Does Rolling Update Trigger “No Route to Host” in Kubernetes?
A Kubernetes user reported intermittent “No route to host” errors during Deployment rolling updates, and this article analyzes common connection‑related errors, explains how IPVS weight handling and source‑port reuse cause the issue, and provides mitigation strategies such as preStop hooks, readiness probes, scaling, and anti‑affinity.
Background
The author, a member of the Tencent Cloud Container Service (TKE) team, follows up a previous popular article on Kubernetes network troubleshooting with a focused case study on the “No route to host” error that appears during Deployment rolling updates.
Problem Feedback
A user observed that during a rolling update of a Deployment, occasional business logs reported the error “No route to host”.
Common Rolling‑Update Errors
Connection reset by peer: The connection is reset because the server detects malformed packets or the old pod is removed before the client finishes sending requests. Applications should handle SIGTERM and close connections gracefully. Connection refused: The client’s SYN reaches a pod that has stopped listening (port not open) because iptables/ipvs rules have not yet been updated. Adding a preStop hook that delays termination gives kube‑proxy time to refresh rules. Connection timed out: Similar to Connection refused, but the port is listening while the process cannot respond. The same preStop and readiness‑probe recommendations apply.
Suggested Mitigations
Use a preStop hook to pause pod termination, allowing kube‑proxy to update forwarding rules.
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep 30Configure a readinessProbe so the pod is marked Ready only after the service port is truly listening.
readinessProbe:
httpGet:
path: /healthz
port: 80
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 15
timeoutSeconds: 1Deeper Analysis of “No route to host”
The error indicates that the IP cannot be routed, typically because traffic is sent to a pod that has already been destroyed (its network interface no longer exists). During a rolling update, the new pod becomes Ready and is added to the IPVS real‑server list, while the old pod’s weight is set to 0 but the rule is not removed immediately.
Reproducing the Issue
A test environment with two services, ServiceA (client) and ServiceB (server), was built. ServiceA makes many short‑lived RPC calls to ServiceB, which is the target of the rolling update. ipvsadm output shows the old pod still present with weight 0.
root@VM-0-3-ubuntu:~# ipvsadm -ln -t 172.16.255.241:80
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.16.255.241:80 rr
-> 172.16.8.106:80 Masq 0 5 14048
-> 172.16.8.107:80 Masq 1 2 243Root Cause Investigation
IPVS first checks its local connection table for a matching 5‑tuple. If a TIME_WAIT entry from a previous short‑lived connection exists, IPVS treats the new SYN as part of an existing connection and forwards it to the previously selected real server, even though its weight is 0 and the pod has been deleted. This results in a SYN sent to a non‑existent pod, the node returns an ICMP “No route to host”, and the client reports the error.
Packet capture confirms SYN packets are still directed to the old pod IP:
tcpdump -i eth0 host 172.16.8.106 -n -tttt
... IP 10.0.0.3.36702 > 172.16.8.106.80: Flags [S] ...Potential Fixes
Modifying the IPVS kernel source to re‑schedule when the matched real server weight is zero eliminates the symptom, but breaks graceful termination because in‑flight connections are abruptly dropped.
Practical mitigations include:
Increasing the replica count of ServiceA and applying podAntiAffinity so that its pods are spread across nodes, reducing source‑port exhaustion.
Switching the cluster to iptables mode for small deployments.
Continuing to use preStop and readinessProbe to give kube‑proxy time to update rules.
Open Issues
Related discussions can be found in Kubernetes issues #81775, #85517, and #81308, which explore IPVS weight handling, connection‑state timeouts, and graceful termination behavior.
Conclusion
The “No route to host” error during rolling updates is caused by source‑port reuse and IPVS forwarding to a pod whose weight has been set to zero but whose connection entry still matches a TIME_WAIT flow. Proper use of preStop, readiness probes, scaling, and anti‑affinity can mitigate the problem, while a definitive kernel‑level fix remains pending.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
