Operations 21 min read

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

This article details a real-world Kubernetes control‑plane outage where kube‑apiserver repeatedly OOM‑killed, explores cluster metrics, logs, heap and goroutine profiles, hypothesizes root causes such as etcd latency and DeleteCollection memory leaks, and offers step‑by‑step troubleshooting and prevention guidance.

Efficient Ops

Jul 11, 2023

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

Cluster and environment information:

k8s v1.18.4

3 master nodes, each 8 CPU / 16 GB RAM, 50 Gi‑SSD

19 heterogeneous minion nodes

Control‑plane components (kube‑apiserver, etcd, kube‑controller‑manager, kube‑scheduler) deployed as static pods

VIP load‑balances traffic to the three kube‑apiserver front‑ends

Tencent Cloud SSD performance ~130 MB/s

Fault description

On the afternoon of 2022‑09‑10, kubectl occasionally hung and could not CRUD standard resources (Pod, Node, etc.). The issue was traced to some kube‑apiserver instances becoming unresponsive.

On‑site information

kube‑apiserver pod details (kube‑system namespace):

$ kubectl get pods -n kube-system kube-apiserver-x.x.x.x -o yaml</code><code>...</code><code>containerStatuses:</code><code>- containerID: docker://xxxxx</code><code>...</code><code>lastState:</code><code>terminated:</code><code>containerID: docker://yyyy</code><code>exitCode: 137</code><code>finishedAt: "2021-09-10T09:29:02Z"</code><code>reason: OOMKilled</code><code>startedAt: "2020-12-09T07:02:23Z"</code><code>name: kube-apiserver</code><code>ready: true</code><code>restartCount: 1</code><code>started: true</code><code>state:</code><code>running:</code><code>startedAt: "2021-09-10T09:29:08Z"</code><code>...

9 September: kube‑apiserver was OOM‑killed.

Surrounding monitoring

IaaS layer black‑box monitoring (control‑plane hosts):

Effective information:

Memory, CPU, and disk read metrics were positively correlated and sharply dropped after 9 September, returning to normal.

Kube‑apiserver Prometheus monitoring:

Effective information:

kube‑apiserver I/O problems: Prometheus failed to scrape metrics for a period.

kube‑apiserver memory grew monotonically; its internal workqueue ADD iops were very high.

Real‑time debug information:

Effective information:

Both master nodes’ memory usage reached ~80‑90%.

kube‑apiserver processes consumed most of the memory.

One master’s CPU was saturated with high kernel‑mode wait (wa).

Almost every process on the machines was heavily reading disks, making shells nearly unusable.

The only relatively low‑memory master (8 Gi) had previously been OOM‑killed.

Some questions and hypotheses

Why does kube‑apiserver consume a lot of memory?

Clients performing full‑list operations on core resources.

etcd unable to serve, causing kube‑apiserver to fail leader election for other control‑plane components, leading to repeated ListAndWatch loops.

Potential memory leak in kube‑apiserver code.

Why is the etcd cluster malfunctioning?

Network jitter within the etcd cluster.

Disk performance degradation.

Insufficient CPU/RAM on etcd hosts, causing limited time‑slice allocation and deadline expirations.

Why do kube‑controller‑manager and kube‑scheduler read disks heavily?

They read local configuration files.

Under extreme memory pressure the OS evicts pages of large processes; when rescheduled they reload from disk, increasing I/O.

Some logs

kube‑apiserver related logs:

I0907 07:04:17.611412 1 trace.go:116] Trace[1140445702]: "Get" url:/apis/storage.k8s.io/v1/volumeattachments/... (total time: 976.1773ms):</code><code>Trace[1140445702]: [976.164659ms] About to write a response</code><code>I0907 07:04:17.611478 1 trace.go:116] Trace[630463685]: "Get" url:/apis/storage.k8s.io/v1/volumeattachments/... (total time: 983.823847ms):</code><code>Trace[630463685]: [983.812225ms] About to write a response</code><code>...</code><code>E0907 07:04:37.327057 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled]</code><code>W0907 07:10:39.496915 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://etcd0:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing context deadline exceeded". Reconnecting...

etcd operation latency increased dramatically and eventually lost connection.

etcd logs (partial):

{"level":"warn","ts":"2021-09-10T17:14:50.559+0800","msg":"rejected connection","remote-addr":"10.0.0.42:49824","error":"read tcp 10.0.0.8:2380->10.0.0.42:49824: i/o timeout"}</code><code>{"level":"warn","ts":"2021-09-10T17:14:58.993+0800","msg":"rejected connection","remote-addr":"10.0.0.6:54656","error":"EOF"}</code><code>...

etcd communication with the node was abnormal, preventing it from serving.

Deep investigation

Heap profile of kube‑apiserver shows massive memory consumption by registry(*Store).DeleteCollection, which first lists items then deletes them concurrently, explaining the sudden memory spike.

If e.Delete fails (as in our etcd error scenario), worker goroutines exit but the dispatcher goroutine blocks on sending to the toProcess channel, preventing garbage collection of the retrieved items and causing OOM.

kube‑apiserver goroutine‑profile

goroutine 18970952966 [chan send, 429 minutes]: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).DeleteCollection.func1(...)</code><code>...</code><code># ... many more ...

Most goroutines are blocked on channel send, indicating the dispatcher deadlock.

kube‑controller‑manager logs

E1027 15:15:01.016712 1 leaderelection.go:320] error retrieving resource lock kube-system/kube-controller-manager: etcdserver: request timed out</code><code>I1027 15:15:01.950682 1 leaderelection.go:277] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition</code><code>F1027 15:15:01.950760 1 controllermanager.go:279] leaderelection lost

The controller manager could not renew its lease because kube‑apiserver failed to communicate with etcd.

kube‑apiserver DeleteCollection implementation

func (e *Store) DeleteCollection(ctx context.Context, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions, listOptions *metainternalversion.ListOptions) (runtime.Object, error) {</code><code>    listObj, err := e.List(ctx, listOptions)</code><code>    if err != nil { return nil, err }</code><code>    items, err := meta.ExtractList(listObj)</code><code>    wg := sync.WaitGroup{}</code><code>    toProcess := make(chan int, 2*workersNumber)</code><code>    errs := make(chan error, workersNumber+1)</code><code>    // dispatcher goroutine</code><code>    wg.Add(workersNumber)</code><code>    for i := 0; i < workersNumber; i++ { go func() { defer wg.Done(); for index := range toProcess { if _, _, err := e.Delete(ctx, accessor.GetName(), deleteValidation, options.DeepCopy()); err != nil && !apierrors.IsNotFound(err) { errs <- err; return } } }() }</code><code>    wg.Wait()</code><code>    select { case err := <-errs: return nil, err default: return listObj, nil }</code><code>}

If e.Delete errors (as with etcd failures), the dispatcher goroutine blocks on sending to toProcess , preventing GC of the listed items and leading to OOM.

Summary

Define a clear baseline for a healthy cluster (e.g., 100 nodes, 1400 pods, 50 ConfigMaps, 300 events, kube‑apiserver ~2 Gi memory, ~10 % single‑core CPU).

Detect anomalies via monitoring and logs, then pinpoint the failing component.

Correlate timestamps of abnormal CPU, RAM, and disk usage with component logs and profiles.

Form hypotheses, validate with heap and goroutine profiles, and examine source code.

Prevent control‑plane cascade failures by allocating sufficient resources to kube‑apiserver, isolating etcd clusters, and monitoring DeleteCollection activity.

Original article: k8s‑club/kube‑apiserver.md

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OOM Etcd kube-apiserver

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.