Cloud Native 8 min read

Why Did My Kubernetes Pods Stop Communicating? Uncovering CoreDNS ReadinessProbe Failures

A sudden Kubernetes outage was traced to a CoreDNS ReadinessProbe failure on port 8181, and the article walks through symptom detection, code‑level root‑cause analysis, and a ConfigMap fix that restores inter‑pod communication.

Efficient Ops

Nov 14, 2023

Why Did My Kubernetes Pods Stop Communicating? Uncovering CoreDNS ReadinessProbe Failures

1. Incident Background

A frantic early‑morning call reported that all pod‑to‑pod calls in a Kubernetes cluster had stopped, prompting an urgent investigation of CoreDNS logs that showed domain resolution errors.

2. Symptom Observation

Checking the CoreDNS pods revealed that ports 8080 and 9153 responded normally, while port 8181 was unreachable.

All CoreDNS pods were stuck in a 0/1 Running state, and kubectl describe showed a ReadinessProbe failure.

Readiness probe failed: Get "http://x.x.x.x:8181/ready": dial tcp x.x.x.x:8181: connect: connection refused

The ReadinessProbe determines whether a pod can be added to a Service endpoint; failure leaves the Service endpoint empty, breaking inter‑pod communication.

3. Root Cause Analysis

Reviewing the CoreDNS source revealed that the ready plugin registers an HTTP server on port 8181. The parse function may receive an empty or malformed address, causing the probe to fail when the plugin is not explicitly configured.

// Register ready plugin
func init() { plugin.Register("ready", setup) }
func setup(c *caddy.Controller) error {
    addr, err := parse(c)
    if err != nil { return plugin.Error("ready", err) }
    rd := &ready{Addr: addr}
    uniqAddr.Set(addr, rd.onStartup)
    c.OnStartup(func() error { uniqAddr.Set(addr, rd.onStartup); return nil })
    // ... omitted ...
    return nil
}
func parse(c *caddy.Controller) (string, error) {
    addr := ":8181"
    // ...
    if len(args) == 1 {
        addr = args[0]
        if _, _, e := net.SplitHostPort(addr); e != nil { return "", e }
    }
    // ...
    return addr, nil
}

The ready component had never been added to the CoreDNS ConfigMap, so the default address caused the probe to fail.

4. Fix Method

Manually add the ready plugin with the correct port to the CoreDNS ConfigMap and restart the pods:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        ready :8181   # added line
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        log
        cache 30
        loop
        reload
        loadbalance
    }

After applying the ConfigMap and restarting CoreDNS, the ReadinessProbe succeeded and services recovered.

5. Result

CoreDNS pods now run normally, the ReadinessProbe passes, and the business workload is fully restored.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Troubleshooting CoreDNS readinessProbe

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.