Cloud Native 39 min read

Kubernetes Service Connectivity Issues? A Step‑by‑Step Guide from Pods to Services to Ingress

This article provides a systematic, layer‑by‑layer troubleshooting guide for Kubernetes service connectivity problems, covering pod health, service and endpoint configuration, kube‑proxy rules, CNI plugins, Ingress controllers, DNS resolution, and NetworkPolicy, with concrete commands, examples, and preventive scripts.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Kubernetes Service Connectivity Issues? A Step‑by‑Step Guide from Pods to Services to Ingress

Problem Background

Kubernetes service connectivity failures are more common than node NotReady issues and directly affect business availability, manifesting as pages that load without data, 502/504 API responses, or inter‑pod timeouts. Root causes span multiple layers: pod status, Service misconfiguration, missing Endpoints, network policies, Ingress errors, or DNS failures.

Applicable Scenarios

External access returns 502 Bad Gateway or 504 Gateway Timeout

Pod‑to‑Pod calls via Service name time out or are refused

Ingress URL cannot resolve or routes to the wrong backend

NodePort/LoadBalancer Services are unreachable from outside

Headless Service pods cannot discover each other via DNS

Pods in one namespace cannot reach services in another namespace

Specific IP ranges or ports are blocked by NetworkPolicy

Kubernetes Service Traffic Path Overview

Understanding the full traffic flow is essential before troubleshooting. Different service types have different paths.

Scenario 1: In‑cluster Pod accesses a ClusterIP Service (most common)

PodA -> ClusterIP:ServicePort -> kube-proxy (iptables/ipvs) -> EndpointIP:ContainerPort -> PodB

Inside a pod, the request URL is http://service-name.namespace.svc.cluster.local:port, which CoreDNS resolves to the Service IP. kube‑proxy then DNAT‑forwards the traffic to a matching Endpoint IP.

Scenario 2: External client accesses via NodePort

External client -> NodeIP:NodePort -> kube-proxy -> EndpointIP:ContainerPort -> PodB

The request lands on any node’s NodePort; kube‑proxy forwards it to the backend pod, regardless of which node the pod runs on.

Scenario 3: Access through Ingress

External client -> Ingress Controller Pod -> Ingress rule match -> Service:NodePort -> kube-proxy -> PodB

The Ingress Controller (usually a NodePort or LoadBalancer Service) matches Host/Path rules, then forwards traffic to the backend Service.

Step 1: Confirm Scope and Symptoms

1.1 Verify the problem really exists

# Verify the target Service exists
kubectl get svc -n <namespace> <service-name>

# Verify Pods exist and are Running
kubectl get pods -n <namespace> -l app=<app-label>

# If any Pod is not Running, fix the Pod first
# (Pods that are not up cannot be reached by the Service)

1.2 Validate the business‑level symptom

# Test external access (NodePort or LoadBalancer IP)
curl -v http://<external-ip>:<port>/<health-path>

# Typical 502 response
# HTTP/1.1 502 Bad Gateway
# Server: nginx/1.24.0

# Typical timeout
# curl: (7) Failed to connect to <ip> port <port>: Connection timed out

1.3 Determine impact range

# List healthy Endpoints for the Service
kubectl get endpoints -n <namespace> <service-name>

# If the endpoint list is empty or fewer than expected, the Service is not linked to healthy Pods
# If endpoints look normal, the issue may be in Ingress or higher layers

# Show Pod distribution
kubectl get pods -n <namespace> -o wide | grep <service-name>

Step 2: Diagnose the Pod Layer

2.1 Confirm Pods exist and are Running

# List all Pods with wide output
kubectl get pods -n <namespace> -o wide

# Verify all replicas are Running
kubectl get pods -n <namespace> --field-selector=status.phase!=Running

2.2 Inspect Pod events

# Describe a Pod and look at the last 20 lines of Events
kubectl describe pod <pod-name> -n <namespace> | tail -20

# Common event clues:
# - ImagePullBackOff: image pull failure
# - CrashLoopBackOff: container exits immediately
# - CreateContainerConfigError: config error (ConfigMap/Secret)
# - OOMKilled: out‑of‑memory kill
# - Evicted: node resource pressure

2.3 View container logs

# Current logs (stdout)
kubectl logs <pod-name> -n <namespace>

# Previous logs (if container restarted)
kubectl logs <pod-name> -n <namespace> --previous

# Specify container in multi‑container Pods
kubectl logs <pod-name> -n <namespace> -c <container-name>

# Follow logs
kubectl logs -f <pod-name> -n <namespace> --tail=100

2.4 Exec into the Pod for network checks

# Open a shell (requires bash)
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

# If no bash, use sh
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Verify the process is listening on the expected port
# Java: ps aux | grep java
# Nginx/Python/Node.js: netstat -tlnp or ss -tlnp

# Test localhost port
wget -qO- http://127.0.0.1:<container-port>/healthz
curl -s http://127.0.0.1:<container-port>/healthz

# If localhost works but Service access fails, the problem lies in the Service layer

2.5 Check Pod resource limits

# Show resource requests and limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

# Show actual usage (requires metrics‑server)
kubectl top pod <pod-name> -n <namespace>

# If usage is near limits, increase limits or optimise memory usage

Step 3: Diagnose the Service Layer

3.1 Verify Service configuration

# Show full Service YAML
kubectl get svc <service-name> -n <namespace> -o yaml

# Key fields to check:
# spec.selector – must match target Pod labels
# spec.ports – port, targetPort, protocol must be correct
# spec.type – ClusterIP / NodePort / LoadBalancer

Common mistake 1: Wrong selector

# Wrong example: selector app: web-frontend, but Pods have app: web
spec:
  selector:
    app: web-frontend

Fix: ensure the selector matches actual Pod labels using kubectl get pods -n <namespace> --show-labels and verify with kubectl get pods -n <namespace> -l app=web-frontend.

Common mistake 2: Port configuration error

# Wrong example: targetPort is a string "http" while the container listens on numeric port 8080
spec:
  ports:
  - port: 80
    targetPort: "http"
    protocol: TCP

3.2 Ensure Endpoints exist and are healthy

# List Endpoints
kubectl get endpoints <service-name> -n <namespace>

# Normal output example:
# NAME        ENDPOINTS
# my-service  10.244.1.15:8080,10.244.2.23:8080

# If empty, investigate selector mismatch or readinessProbe failures
kubectl get pods -n <namespace> --show-labels
kubectl get pods -n <namespace> -o wide
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].readinessProbe}'

3.3 Test Service access from inside the cluster

# Create a temporary test Pod
kubectl run -n <namespace> testpod --image=busybox:1.36 --restart=Never -it --rm -- sh

# DNS resolution test
nslookup <service-name>

# Direct IP test
wget -qO- http://<ClusterIP>:<port>/healthz

# DNS name test
wget -qO- http://<service-name>.<namespace>.svc.cluster.local:<port>/healthz

# If DNS resolves but connection times out, kube‑proxy may be mis‑forwarding

3.4 Test NodePort Service

# Show NodePort
kubectl get svc <service-name> -n <namespace> | grep NodePort

# Example output: NodePort: http 30080/TCP

# Test from outside (ensure firewall allows the port)
curl -v http://<any-node-ip>:30080/<path>

# If some nodes work and others don’t, check kube‑proxy on the failing node

Step 4: Diagnose the kube‑proxy Layer

4.1 Verify kube‑proxy is running

# List kube‑proxy DaemonSet Pods
kubectl get pods -n kube-system -l k8s-app=kube-proxy

# View logs
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=50

# If Pods are not Running, describe them for details

4.2 Check kube‑proxy mode (iptables vs ipvs)

# Show mode from ConfigMap
kubectl get configmap -n kube-system kube-proxy -o yaml | grep mode

# Or on a node
ssh <node> "grep mode /var/lib/kubelet/config.yaml"

# iptables – default, stable
# ipvs – higher performance, more complex, kernel‑version dependent

4.3 iptables mode troubleshooting

# On a node, list NAT table rules for the Service
ssh <node> "sudo iptables -t nat -L -n | grep <service-name>"

# Look for KUBE‑SVC‑XXXX chain and KUBE‑SEP‑XXXX chains
# If missing, kube‑proxy failed to generate rules

# Verify KUBE‑SERVICES chain
ssh <node> "sudo iptables -t nat -L KUBE-SERVICES -n | grep <service-name>"

# Check FILTER table FORWARD chain for ACCEPT rules
ssh <node> "sudo iptables -t filter -L FORWARD -n | grep KUBE"

4.4 ipvs mode troubleshooting

# Show ipvs rules
ssh <node> "sudo ipvsadm -L -n"

# Expected output includes Service IP and backend Endpoints
# If ipvsadm not installed or ip_vs module missing, kube‑proxy falls back to iptables

4.5 Analyse kube‑proxy logs for errors

# Search for errors
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep -i error

# Common errors:
# "Failed to delete service" – cleanup failure
# "Failed to sync iptables" – permission or rule issue
# "ipvs struct not found" – ipvs runtime problem

Step 5: Diagnose the CNI (Network Plugin) Layer

5.1 Verify pod network connectivity

# Exec into a problematic pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Ping another pod directly (using its Pod IP)
ping -c 3 <other-pod-ip>

# Show routing table – default gateway should be the CNI bridge (flannel.1, calico, etc.)
ip route

# If ping fails, check that both pods are in the same subnet and that the CNI interface exists:
ip addr | grep -E "flannel|calico|cni|docker"

5.2 Common CNI checks – Flannel

# Verify Flannel DaemonSet pods
kubectl get pods -n kube-system -l app=flannel

# Check flannel.1 interface on a node
ip addr show flannel.1

# View Flannel network range (Pod CIDR)
kubectl get cm -n kube-system kube-flannel-cfg -o yaml | grep -A 3 "net-conf.json"

# If flannel.1 is down, the node’s CNI network is not initialized

5.3 Common CNI checks – Calico

# Verify calico-node pods
kubectl get pods -n kube-system -l k8s-app=calico-node

# View calico-node logs
kubectl logs -n kube-system -l k8s-app=calico-node --tail=100

# Check BGP peers (requires calicoctl)
calicoctl node status

# List IP pools
calicoctl get ippool -o wide

# If a node has no IP pool address, pod IP allocation fails

5.4 Cross‑node pod communication test

# Get pod IPs on different nodes
kubectl get pods -o wide --all-namespaces | grep Running

# From source pod, ping target pod IP
kubectl exec -it <source-pod> -n <namespace> -- ping -c 5 <target-pod-ip>

# If same‑node ping works but cross‑node fails, investigate node‑to‑node routing, CNI tunnels (VXLAN, BGP), or firewall rules (flannel UDP 8472, Calico IP‑in‑IP 4)

Step 6: Diagnose the Ingress Layer

6.1 Verify Ingress controller is healthy

# List Ingress controller pods (usually in kube-system)
kubectl get pods -n kube-system | grep -E "ingress|nginx"

# If not Running, describe the pod
kubectl describe pod -n kube-system <ingress-controller-pod>

# If the pod restarts frequently, view previous logs
kubectl logs -n kube-system <ingress-controller-pod> --previous | tail -50

6.2 Inspect Ingress resources

# List all Ingresses
kubectl get ingress -A

# Show a specific Ingress in YAML
kubectl get ingress <ingress-name> -n <namespace> -o yaml

# Important fields:
# spec.rules – Host and Path
# spec.tls – TLS certificate configuration
# spec.backend – default backend when no rule matches

6.3 Ensure Ingress links to the correct Service

# Verify backend service name matches the target Service
kubectl get ingress <ingress-name> -n <namespace> -o jsonpath='{.spec.rules[*].http.paths[*].backend.service}'

# Verify backend port matches Service port
kubectl get ingress <ingress-name> -n <namespace> -o jsonpath='{.spec.rules[*].http.paths[*].backend.service.port}'

6.4 Analyse Ingress controller logs

# Tail logs
kubectl logs -n kube-system <ingress-controller-pod> -f --tail=100

# Search for 502/503/504 errors
kubectl logs -n kube-system <ingress-controller-pod> | grep " 502 \| 503 \| 504 "

# Search for a specific Host
kubectl logs -n kube-system <ingress-controller-pod> | grep "Host: <domain>"

6.5 Step‑by‑step Ingress traffic path

1. External -> DNS -> Ingress Controller external IP/NodePort
2. Ingress Controller -> rule match -> Backend Service
3. Backend Service -> kube‑proxy -> Endpoint -> Pod

6.6 TLS/HTTPS issues

# Show TLS configuration
kubectl get ingress <ingress-name> -n <namespace> -o yaml | grep -A 10 tls

# Common TLS problems:
# 1. Secret does not exist
# 2. Secret type is not kubernetes.io/tls
# 3. Certificate expired
# 4. Certificate does not match domain

# Test HTTPS (skip verification to isolate TLS problems)
curl -v https://<domain>/<path> --insecure

Step 7: Diagnose DNS Issues

7.1 Test DNS resolution from a pod

# nslookup Kubernetes default service
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.default

# nslookup the target Service
kubectl exec -it <pod-name> -n <namespace> -- nslookup <service-name>.<namespace>.svc.cluster.local

# If nslookup missing, install dnsutils and use dig
kubectl exec -it <pod-name> -n <namespace> -- apt-get update && apt-get install -y dnsutils
kubectl exec -it <pod-name> -n <namespace> -- dig +short kubernetes.default.svc.cluster.local

# Verify full DNS name
kubectl exec -it <pod-name> -n <namespace> -- getent hosts <service-name>.<namespace>.svc.cluster.local

7.2 Check CoreDNS status

# List CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

# Show CoreDNS ConfigMap
kubectl get configmap -n kube-system coredns -o yaml

7.3 Common causes of slow DNS

Kernel limits on concurrent DNS queries per container

High ndots value (default 5) causing unnecessary cluster DNS lookups for external names

Optimize ndots

# Add dnsConfig to Pod spec
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"
    - name: timeout
      value: "2"
    - name: attempts
      value: "2"

Step 8: Diagnose NetworkPolicy

8.1 List NetworkPolicies in the namespace

# Show policies
kubectl get networkpolicy -n <namespace>

# Show a specific policy
kubectl get networkpolicy <policy-name> -n <namespace> -o yaml

8.2 Typical NetworkPolicy misconfiguration

# Example: only Pods with label role=frontend are allowed
spec:
  podSelector:
    matchLabels:
      role: frontend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: nginx
# If the request comes from a Pod without the nginx label, it is blocked

Temporary test: allow all traffic

# Delete the problematic policy
kubectl delete networkpolicy <policy-name> -n <namespace>

# Or apply a permissive policy
kubectl apply -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all
  namespace: <namespace>
spec:
  podSelector: {}
  ingress:
  - {}
EOF

Step 9: Comprehensive Failure Cases

Case 1 – Service has Endpoints but returns 502

Symptoms: Endpoints list is normal, but accessing via Ingress yields 502.

# Verify Service and Endpoints
kubectl get svc my-service -n default
kubectl get endpoints my-service -n default

# From a pod, curl the Service directly (bypassing Ingress)
kubectl run -n default curl-test --image=curlimages/curl --restart=Never -it --rm -- sh
curl http://my-service.default.svc.cluster.local:8080/api/health

# Ingress returns 502
curl http://<ingress-ip>/api/health

# Check Ingress controller logs for timeout
kubectl logs -n kube-system nginx-ingress-controller-xxx | grep "/api/health"
# Log shows "upstream timed out (110: Connection timed out)"

# Root cause: Ingress controller sent an invalid Host header; backend rejected it.
# Fix: Align Ingress host field with the application’s expected Host header.

Case 2 – Headless Service pods cannot discover each other

Symptoms: StatefulSet MySQL uses a Headless Service, but pods cannot resolve each other via DNS.

# Show Headless Service YAML
kubectl get svc mysql-headless -n database -o yaml
# Ensure clusterIP: None is present.

# Verify DNS record from a pod
kubectl exec -it mysql-0 -n database -- nslookup mysql-headless.database.svc.cluster.local

# If empty, check:
# - Selector matches Pod labels
kubectl get pods -n database -l app=mysql --show-labels
# - StatefulSet serviceName matches Headless Service name
kubectl get statefulset mysql -n database -o jsonpath='{.spec.serviceName}'

# Root cause: StatefulSet serviceName differed from Headless Service name, so DNS records were not created.

Step 10: Preventive Measures

10.1 Service reachability health check script

#!/bin/bash
# Collect all non‑kube‑system namespaces
NAMESPACES=$(kubectl get ns -o jsonpath='{.items[*].metadata.name}' | tr ' ' '
' | grep -v kube-system)

for NS in $NAMESPACES; do
  # List all non‑Headless ClusterIP Services
  SERVICES=$(kubectl get svc -n $NS -o json | jq -r '.items[] | select(.spec.clusterIP != "None" and .spec.clusterIP != "") | .metadata.name')
  for SVC in $SERVICES; do
    ENDPOINTS=$(kubectl get endpoints $SVC -n $NS -o json | jq -r '.subsets | length')
    if [ "$ENDPOINTS" == "0" ] || [ "$ENDPOINTS" == "null" ]; then
      echo "[ALERT] Service $SVC in namespace $NS has NO endpoints"
    fi
  done
done

echo "Health check completed"

10.2 Verify Ingress‑Service association

#!/bin/bash
# List all Ingresses and their backend Services
for INGRESS in $(kubectl get ingress -A -o jsonpath='{.items[*].metadata.name}'); do
  NS=$(kubectl get ingress -A -o jsonpath="{.items[?(@.metadata.name=='$INGRESS')].metadata.namespace}")
  SVC=$(kubectl get ingress $INGRESS -n $NS -o jsonpath='{.spec.rules[0].http.paths[0].backend.service.name}')
  PORT=$(kubectl get ingress $INGRESS -n $NS -o jsonpath='{.spec.rules[0].http.paths[0].backend.service.port.number}')
  echo "Ingress: $INGRESS, Service: $SVC, Port: $PORT"
  EP_COUNT=$(kubectl get endpoints $SVC -n $NS -o json 2>/dev/null | jq '.subsets | map(.addresses) | flatten | length')
  if [ "$EP_COUNT" == "0" ] || [ "$EP_COUNT" == "null" ]; then
    echo "  [WARNING] Service $SVC has no endpoints!"
  fi
done

10.3 Standard readinessProbe templates

# Spring Boot readinessProbe
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

# Golang readinessProbe
readinessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

Conclusion

Service connectivity problems in Kubernetes are best resolved by a layered approach: start with pod health, then verify Service Endpoints, test direct Service access, examine kube‑proxy rules, confirm CNI operation, inspect Ingress configuration and logs, validate DNS and CoreDNS, and finally review NetworkPolicy and cross‑node routing. Following the prioritized checklist—Pod → Service → Endpoint → Ingress → DNS → NetworkPolicy—dramatically reduces mean‑time‑to‑resolution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesNetworkTroubleshootingServiceIngressPodkube-proxy
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.