Kubernetes Service Connectivity Issues? A Step‑by‑Step Guide from Pods to Services to Ingress
This article provides a systematic, layer‑by‑layer troubleshooting guide for Kubernetes service connectivity problems, covering pod health, service and endpoint configuration, kube‑proxy rules, CNI plugins, Ingress controllers, DNS resolution, and NetworkPolicy, with concrete commands, examples, and preventive scripts.
Problem Background
Kubernetes service connectivity failures are more common than node NotReady issues and directly affect business availability, manifesting as pages that load without data, 502/504 API responses, or inter‑pod timeouts. Root causes span multiple layers: pod status, Service misconfiguration, missing Endpoints, network policies, Ingress errors, or DNS failures.
Applicable Scenarios
External access returns 502 Bad Gateway or 504 Gateway Timeout
Pod‑to‑Pod calls via Service name time out or are refused
Ingress URL cannot resolve or routes to the wrong backend
NodePort/LoadBalancer Services are unreachable from outside
Headless Service pods cannot discover each other via DNS
Pods in one namespace cannot reach services in another namespace
Specific IP ranges or ports are blocked by NetworkPolicy
Kubernetes Service Traffic Path Overview
Understanding the full traffic flow is essential before troubleshooting. Different service types have different paths.
Scenario 1: In‑cluster Pod accesses a ClusterIP Service (most common)
PodA -> ClusterIP:ServicePort -> kube-proxy (iptables/ipvs) -> EndpointIP:ContainerPort -> PodBInside a pod, the request URL is http://service-name.namespace.svc.cluster.local:port, which CoreDNS resolves to the Service IP. kube‑proxy then DNAT‑forwards the traffic to a matching Endpoint IP.
Scenario 2: External client accesses via NodePort
External client -> NodeIP:NodePort -> kube-proxy -> EndpointIP:ContainerPort -> PodBThe request lands on any node’s NodePort; kube‑proxy forwards it to the backend pod, regardless of which node the pod runs on.
Scenario 3: Access through Ingress
External client -> Ingress Controller Pod -> Ingress rule match -> Service:NodePort -> kube-proxy -> PodBThe Ingress Controller (usually a NodePort or LoadBalancer Service) matches Host/Path rules, then forwards traffic to the backend Service.
Step 1: Confirm Scope and Symptoms
1.1 Verify the problem really exists
# Verify the target Service exists
kubectl get svc -n <namespace> <service-name>
# Verify Pods exist and are Running
kubectl get pods -n <namespace> -l app=<app-label>
# If any Pod is not Running, fix the Pod first
# (Pods that are not up cannot be reached by the Service)1.2 Validate the business‑level symptom
# Test external access (NodePort or LoadBalancer IP)
curl -v http://<external-ip>:<port>/<health-path>
# Typical 502 response
# HTTP/1.1 502 Bad Gateway
# Server: nginx/1.24.0
# Typical timeout
# curl: (7) Failed to connect to <ip> port <port>: Connection timed out1.3 Determine impact range
# List healthy Endpoints for the Service
kubectl get endpoints -n <namespace> <service-name>
# If the endpoint list is empty or fewer than expected, the Service is not linked to healthy Pods
# If endpoints look normal, the issue may be in Ingress or higher layers
# Show Pod distribution
kubectl get pods -n <namespace> -o wide | grep <service-name>Step 2: Diagnose the Pod Layer
2.1 Confirm Pods exist and are Running
# List all Pods with wide output
kubectl get pods -n <namespace> -o wide
# Verify all replicas are Running
kubectl get pods -n <namespace> --field-selector=status.phase!=Running2.2 Inspect Pod events
# Describe a Pod and look at the last 20 lines of Events
kubectl describe pod <pod-name> -n <namespace> | tail -20
# Common event clues:
# - ImagePullBackOff: image pull failure
# - CrashLoopBackOff: container exits immediately
# - CreateContainerConfigError: config error (ConfigMap/Secret)
# - OOMKilled: out‑of‑memory kill
# - Evicted: node resource pressure2.3 View container logs
# Current logs (stdout)
kubectl logs <pod-name> -n <namespace>
# Previous logs (if container restarted)
kubectl logs <pod-name> -n <namespace> --previous
# Specify container in multi‑container Pods
kubectl logs <pod-name> -n <namespace> -c <container-name>
# Follow logs
kubectl logs -f <pod-name> -n <namespace> --tail=1002.4 Exec into the Pod for network checks
# Open a shell (requires bash)
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash
# If no bash, use sh
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Verify the process is listening on the expected port
# Java: ps aux | grep java
# Nginx/Python/Node.js: netstat -tlnp or ss -tlnp
# Test localhost port
wget -qO- http://127.0.0.1:<container-port>/healthz
curl -s http://127.0.0.1:<container-port>/healthz
# If localhost works but Service access fails, the problem lies in the Service layer2.5 Check Pod resource limits
# Show resource requests and limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'
# Show actual usage (requires metrics‑server)
kubectl top pod <pod-name> -n <namespace>
# If usage is near limits, increase limits or optimise memory usageStep 3: Diagnose the Service Layer
3.1 Verify Service configuration
# Show full Service YAML
kubectl get svc <service-name> -n <namespace> -o yaml
# Key fields to check:
# spec.selector – must match target Pod labels
# spec.ports – port, targetPort, protocol must be correct
# spec.type – ClusterIP / NodePort / LoadBalancerCommon mistake 1: Wrong selector
# Wrong example: selector app: web-frontend, but Pods have app: web
spec:
selector:
app: web-frontendFix: ensure the selector matches actual Pod labels using kubectl get pods -n <namespace> --show-labels and verify with kubectl get pods -n <namespace> -l app=web-frontend.
Common mistake 2: Port configuration error
# Wrong example: targetPort is a string "http" while the container listens on numeric port 8080
spec:
ports:
- port: 80
targetPort: "http"
protocol: TCP3.2 Ensure Endpoints exist and are healthy
# List Endpoints
kubectl get endpoints <service-name> -n <namespace>
# Normal output example:
# NAME ENDPOINTS
# my-service 10.244.1.15:8080,10.244.2.23:8080
# If empty, investigate selector mismatch or readinessProbe failures
kubectl get pods -n <namespace> --show-labels
kubectl get pods -n <namespace> -o wide
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].readinessProbe}'3.3 Test Service access from inside the cluster
# Create a temporary test Pod
kubectl run -n <namespace> testpod --image=busybox:1.36 --restart=Never -it --rm -- sh
# DNS resolution test
nslookup <service-name>
# Direct IP test
wget -qO- http://<ClusterIP>:<port>/healthz
# DNS name test
wget -qO- http://<service-name>.<namespace>.svc.cluster.local:<port>/healthz
# If DNS resolves but connection times out, kube‑proxy may be mis‑forwarding3.4 Test NodePort Service
# Show NodePort
kubectl get svc <service-name> -n <namespace> | grep NodePort
# Example output: NodePort: http 30080/TCP
# Test from outside (ensure firewall allows the port)
curl -v http://<any-node-ip>:30080/<path>
# If some nodes work and others don’t, check kube‑proxy on the failing nodeStep 4: Diagnose the kube‑proxy Layer
4.1 Verify kube‑proxy is running
# List kube‑proxy DaemonSet Pods
kubectl get pods -n kube-system -l k8s-app=kube-proxy
# View logs
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=50
# If Pods are not Running, describe them for details4.2 Check kube‑proxy mode (iptables vs ipvs)
# Show mode from ConfigMap
kubectl get configmap -n kube-system kube-proxy -o yaml | grep mode
# Or on a node
ssh <node> "grep mode /var/lib/kubelet/config.yaml"
# iptables – default, stable
# ipvs – higher performance, more complex, kernel‑version dependent4.3 iptables mode troubleshooting
# On a node, list NAT table rules for the Service
ssh <node> "sudo iptables -t nat -L -n | grep <service-name>"
# Look for KUBE‑SVC‑XXXX chain and KUBE‑SEP‑XXXX chains
# If missing, kube‑proxy failed to generate rules
# Verify KUBE‑SERVICES chain
ssh <node> "sudo iptables -t nat -L KUBE-SERVICES -n | grep <service-name>"
# Check FILTER table FORWARD chain for ACCEPT rules
ssh <node> "sudo iptables -t filter -L FORWARD -n | grep KUBE"4.4 ipvs mode troubleshooting
# Show ipvs rules
ssh <node> "sudo ipvsadm -L -n"
# Expected output includes Service IP and backend Endpoints
# If ipvsadm not installed or ip_vs module missing, kube‑proxy falls back to iptables4.5 Analyse kube‑proxy logs for errors
# Search for errors
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep -i error
# Common errors:
# "Failed to delete service" – cleanup failure
# "Failed to sync iptables" – permission or rule issue
# "ipvs struct not found" – ipvs runtime problemStep 5: Diagnose the CNI (Network Plugin) Layer
5.1 Verify pod network connectivity
# Exec into a problematic pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Ping another pod directly (using its Pod IP)
ping -c 3 <other-pod-ip>
# Show routing table – default gateway should be the CNI bridge (flannel.1, calico, etc.)
ip route
# If ping fails, check that both pods are in the same subnet and that the CNI interface exists:
ip addr | grep -E "flannel|calico|cni|docker"5.2 Common CNI checks – Flannel
# Verify Flannel DaemonSet pods
kubectl get pods -n kube-system -l app=flannel
# Check flannel.1 interface on a node
ip addr show flannel.1
# View Flannel network range (Pod CIDR)
kubectl get cm -n kube-system kube-flannel-cfg -o yaml | grep -A 3 "net-conf.json"
# If flannel.1 is down, the node’s CNI network is not initialized5.3 Common CNI checks – Calico
# Verify calico-node pods
kubectl get pods -n kube-system -l k8s-app=calico-node
# View calico-node logs
kubectl logs -n kube-system -l k8s-app=calico-node --tail=100
# Check BGP peers (requires calicoctl)
calicoctl node status
# List IP pools
calicoctl get ippool -o wide
# If a node has no IP pool address, pod IP allocation fails5.4 Cross‑node pod communication test
# Get pod IPs on different nodes
kubectl get pods -o wide --all-namespaces | grep Running
# From source pod, ping target pod IP
kubectl exec -it <source-pod> -n <namespace> -- ping -c 5 <target-pod-ip>
# If same‑node ping works but cross‑node fails, investigate node‑to‑node routing, CNI tunnels (VXLAN, BGP), or firewall rules (flannel UDP 8472, Calico IP‑in‑IP 4)Step 6: Diagnose the Ingress Layer
6.1 Verify Ingress controller is healthy
# List Ingress controller pods (usually in kube-system)
kubectl get pods -n kube-system | grep -E "ingress|nginx"
# If not Running, describe the pod
kubectl describe pod -n kube-system <ingress-controller-pod>
# If the pod restarts frequently, view previous logs
kubectl logs -n kube-system <ingress-controller-pod> --previous | tail -506.2 Inspect Ingress resources
# List all Ingresses
kubectl get ingress -A
# Show a specific Ingress in YAML
kubectl get ingress <ingress-name> -n <namespace> -o yaml
# Important fields:
# spec.rules – Host and Path
# spec.tls – TLS certificate configuration
# spec.backend – default backend when no rule matches6.3 Ensure Ingress links to the correct Service
# Verify backend service name matches the target Service
kubectl get ingress <ingress-name> -n <namespace> -o jsonpath='{.spec.rules[*].http.paths[*].backend.service}'
# Verify backend port matches Service port
kubectl get ingress <ingress-name> -n <namespace> -o jsonpath='{.spec.rules[*].http.paths[*].backend.service.port}'6.4 Analyse Ingress controller logs
# Tail logs
kubectl logs -n kube-system <ingress-controller-pod> -f --tail=100
# Search for 502/503/504 errors
kubectl logs -n kube-system <ingress-controller-pod> | grep " 502 \| 503 \| 504 "
# Search for a specific Host
kubectl logs -n kube-system <ingress-controller-pod> | grep "Host: <domain>"6.5 Step‑by‑step Ingress traffic path
1. External -> DNS -> Ingress Controller external IP/NodePort
2. Ingress Controller -> rule match -> Backend Service
3. Backend Service -> kube‑proxy -> Endpoint -> Pod6.6 TLS/HTTPS issues
# Show TLS configuration
kubectl get ingress <ingress-name> -n <namespace> -o yaml | grep -A 10 tls
# Common TLS problems:
# 1. Secret does not exist
# 2. Secret type is not kubernetes.io/tls
# 3. Certificate expired
# 4. Certificate does not match domain
# Test HTTPS (skip verification to isolate TLS problems)
curl -v https://<domain>/<path> --insecureStep 7: Diagnose DNS Issues
7.1 Test DNS resolution from a pod
# nslookup Kubernetes default service
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.default
# nslookup the target Service
kubectl exec -it <pod-name> -n <namespace> -- nslookup <service-name>.<namespace>.svc.cluster.local
# If nslookup missing, install dnsutils and use dig
kubectl exec -it <pod-name> -n <namespace> -- apt-get update && apt-get install -y dnsutils
kubectl exec -it <pod-name> -n <namespace> -- dig +short kubernetes.default.svc.cluster.local
# Verify full DNS name
kubectl exec -it <pod-name> -n <namespace> -- getent hosts <service-name>.<namespace>.svc.cluster.local7.2 Check CoreDNS status
# List CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
# Show CoreDNS ConfigMap
kubectl get configmap -n kube-system coredns -o yaml7.3 Common causes of slow DNS
Kernel limits on concurrent DNS queries per container
High ndots value (default 5) causing unnecessary cluster DNS lookups for external names
Optimize ndots
# Add dnsConfig to Pod spec
spec:
dnsConfig:
options:
- name: ndots
value: "2"
- name: timeout
value: "2"
- name: attempts
value: "2"Step 8: Diagnose NetworkPolicy
8.1 List NetworkPolicies in the namespace
# Show policies
kubectl get networkpolicy -n <namespace>
# Show a specific policy
kubectl get networkpolicy <policy-name> -n <namespace> -o yaml8.2 Typical NetworkPolicy misconfiguration
# Example: only Pods with label role=frontend are allowed
spec:
podSelector:
matchLabels:
role: frontend
ingress:
- from:
- podSelector:
matchLabels:
role: nginx
# If the request comes from a Pod without the nginx label, it is blockedTemporary test: allow all traffic
# Delete the problematic policy
kubectl delete networkpolicy <policy-name> -n <namespace>
# Or apply a permissive policy
kubectl apply -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all
namespace: <namespace>
spec:
podSelector: {}
ingress:
- {}
EOFStep 9: Comprehensive Failure Cases
Case 1 – Service has Endpoints but returns 502
Symptoms: Endpoints list is normal, but accessing via Ingress yields 502.
# Verify Service and Endpoints
kubectl get svc my-service -n default
kubectl get endpoints my-service -n default
# From a pod, curl the Service directly (bypassing Ingress)
kubectl run -n default curl-test --image=curlimages/curl --restart=Never -it --rm -- sh
curl http://my-service.default.svc.cluster.local:8080/api/health
# Ingress returns 502
curl http://<ingress-ip>/api/health
# Check Ingress controller logs for timeout
kubectl logs -n kube-system nginx-ingress-controller-xxx | grep "/api/health"
# Log shows "upstream timed out (110: Connection timed out)"
# Root cause: Ingress controller sent an invalid Host header; backend rejected it.
# Fix: Align Ingress host field with the application’s expected Host header.Case 2 – Headless Service pods cannot discover each other
Symptoms: StatefulSet MySQL uses a Headless Service, but pods cannot resolve each other via DNS.
# Show Headless Service YAML
kubectl get svc mysql-headless -n database -o yaml
# Ensure clusterIP: None is present.
# Verify DNS record from a pod
kubectl exec -it mysql-0 -n database -- nslookup mysql-headless.database.svc.cluster.local
# If empty, check:
# - Selector matches Pod labels
kubectl get pods -n database -l app=mysql --show-labels
# - StatefulSet serviceName matches Headless Service name
kubectl get statefulset mysql -n database -o jsonpath='{.spec.serviceName}'
# Root cause: StatefulSet serviceName differed from Headless Service name, so DNS records were not created.Step 10: Preventive Measures
10.1 Service reachability health check script
#!/bin/bash
# Collect all non‑kube‑system namespaces
NAMESPACES=$(kubectl get ns -o jsonpath='{.items[*].metadata.name}' | tr ' ' '
' | grep -v kube-system)
for NS in $NAMESPACES; do
# List all non‑Headless ClusterIP Services
SERVICES=$(kubectl get svc -n $NS -o json | jq -r '.items[] | select(.spec.clusterIP != "None" and .spec.clusterIP != "") | .metadata.name')
for SVC in $SERVICES; do
ENDPOINTS=$(kubectl get endpoints $SVC -n $NS -o json | jq -r '.subsets | length')
if [ "$ENDPOINTS" == "0" ] || [ "$ENDPOINTS" == "null" ]; then
echo "[ALERT] Service $SVC in namespace $NS has NO endpoints"
fi
done
done
echo "Health check completed"10.2 Verify Ingress‑Service association
#!/bin/bash
# List all Ingresses and their backend Services
for INGRESS in $(kubectl get ingress -A -o jsonpath='{.items[*].metadata.name}'); do
NS=$(kubectl get ingress -A -o jsonpath="{.items[?(@.metadata.name=='$INGRESS')].metadata.namespace}")
SVC=$(kubectl get ingress $INGRESS -n $NS -o jsonpath='{.spec.rules[0].http.paths[0].backend.service.name}')
PORT=$(kubectl get ingress $INGRESS -n $NS -o jsonpath='{.spec.rules[0].http.paths[0].backend.service.port.number}')
echo "Ingress: $INGRESS, Service: $SVC, Port: $PORT"
EP_COUNT=$(kubectl get endpoints $SVC -n $NS -o json 2>/dev/null | jq '.subsets | map(.addresses) | flatten | length')
if [ "$EP_COUNT" == "0" ] || [ "$EP_COUNT" == "null" ]; then
echo " [WARNING] Service $SVC has no endpoints!"
fi
done10.3 Standard readinessProbe templates
# Spring Boot readinessProbe
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
# Golang readinessProbe
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3Conclusion
Service connectivity problems in Kubernetes are best resolved by a layered approach: start with pod health, then verify Service Endpoints, test direct Service access, examine kube‑proxy rules, confirm CNI operation, inspect Ingress configuration and logs, validate DNS and CoreDNS, and finally review NetworkPolicy and cross‑node routing. Following the prioritized checklist—Pod → Service → Endpoint → Ingress → DNS → NetworkPolicy—dramatically reduces mean‑time‑to‑resolution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
