Operations 13 min read

Why Kubelet Fails to Reach the API Server: LB Issues and HTTP/2 Pitfalls

This article analyzes a Kubernetes cluster outage caused by a load‑balancer failure that prevented kubelet from connecting to the API server, explores the underlying HTTP/2 behavior, and presents debugging steps and code‑level fixes to restore reliable communication.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why Kubelet Fails to Reach the API Server: LB Issues and HTTP/2 Pitfalls

Background

Kubernetes uses a master‑slave architecture where the master node runs the API server, the central component handling all requests and persisting state to etcd. High availability is typically achieved by deploying multiple API server instances behind a load balancer (LB). If the LB fails, all nodes may become NotReady, triggering massive pod eviction.

Fault Occurrence

During an incident, many nodes reported NotReady. The kubelet logs showed repeated errors such as:

E0415 17:03:11.351872   16624 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "k8s-slave88": Get https://10.13.10.12:6443/api/v1/nodes/k8s-slave88?resourceVersion=0&timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

The IP 10.13.10.12 is the LB address, indicating kubelet could not reach the API server despite successful telnet tests.

Diagnosis

Using tcpdump revealed that kubelet sent packets to the API server but received no ACKs. Restarting kubelet temporarily resolved the issue, confirming a connection‑level problem.

The root cause was traced to the LB: when a new LB instance took over traffic, it dropped connections it could not map, causing kubelet to hang.

Hard Fix

Investigation of client-go showed the default transport uses HTTP/2. Setting the environment variable DISABLE_HTTP2 forces HTTP/1.1, which avoids the hang because HTTP/1.1 reuses idle connections or creates new ones on failure.

HTTP/2 keeps a single connection per host; if that connection stalls, the client may wait minutes before the OS closes it. HTTP/1.1, by contrast, opens new connections when needed, allowing quicker recovery.

Implementation Details

Key code snippets:

// SetTransportDefaults applies defaults and optionally enables HTTP/2
func SetTransportDefaults(t *http.Transport) *http.Transport {
    t = SetOldTransportDefaults(t)
    if s := os.Getenv("DISABLE_HTTP2"); len(s) > 0 {
        klog.Infof("HTTP2 has been explicitly disabled")
    } else {
        if err := http2.ConfigureTransport(t); err != nil {
            klog.Warningf("Transport failed http2 configuration: %v", err)
        }
    }
    return t
}

Because the standard transport does not send HTTP/2 PING frames, stale connections are not detected promptly. A community PR added Ping‑frame support, which was later back‑ported to Kubernetes 1.14.

Turning Point

Testing on Kubernetes v1.10.11 showed the issue disappeared, indicating it was a regression in v1.10.2 that was later fixed. The final fix involved restoring the closeAllConns function to forcefully close all connections on error, which was merged upstream.

References

https://github.com/kubernetes/kubernetes/issues/41916

https://github.com/kubernetes/kubernetes/issues/48638

https://github.com/kubernetes-incubator/kube-aws/issues/598

https://github.com/kubernetes/client-go/issues/374

https://github.com/kubernetes/kubernetes/pull/63492

https://github.com/kubernetes/kubernetes/pull/71174

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeHTTP2troubleshootingLoad Balancerkubelet
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.