Operations 12 min read

How to Fix Kubernetes Memory Leaks and Expired Certificates: Step‑by‑Step Guide

This article explains common Kubernetes issues such as node memory leaks and certificate expiration, provides diagnostic commands, and offers detailed solutions including disabling kmem accounting, recompiling runc and kubelet, and extending certificate validity to ten years.

Open Source Linux
Open Source Linux
Open Source Linux
How to Fix Kubernetes Memory Leaks and Expired Certificates: Step‑by‑Step Guide

1 Problem: Fix K8s Memory Leak

Problem description

After long‑running clusters, some nodes cannot create new pods and show errors like "applying cgroup … caused: mkdir …no space left on device" or "cannot allocate memory" during pod description.

applying cgroup … caused: mkdir …no space left on device

This indicates a possible memory‑leak in the cluster; the more pods are created, the faster memory is exhausted.

Check for memory leak.

$ cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo

If the command returns an I/O error, no leak is present; otherwise a slabinfo output similar to the example below appears.

slabinfo - version: 2.1
# name    <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

Solution

Disable kmem accounting in runc and kubelet (kernel upgrade is more invasive).

Reason: each cgroup subsystem entry is limited (kernel/cgroup.c #L139). Enabling kmem accounting creates entries that are not reclaimed after the cgroup directory is removed, eventually exhausting the 65535 entry limit.

To resolve, recompile runc and kubelet without kmem support.

2.1 Compile runc

Configure Go environment.

$ wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz
$ tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/
# add to bashrc
$ vim ~/.bashrc
$ export GOPATH="/data/Documents"
$ export GOROOT="/usr/local/go"
$ export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"
$ export GO111MODULE=off
$ source ~/.bashrc
$ go env

Download runc source.

$ mkdir -p /data/Documents/src/github.com/opencontainers/
$ cd /data/Documents/src/github.com/opencontainers/
$ git clone https://github.com/opencontainers/runc
$ cd runc/
$ git checkout v1.0.0-rc9

Compile.

# install compile components
$ sudo yum install libseccomp-devel
$ make BUILDTAGS='seccomp nokmem'

2.2 Compile kubelet

Download Kubernetes source.

$ mkdir -p /root/k8s/
$ cd /root/k8s/
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes/
$ git checkout v1.15.3

Create a build‑environment Docker image.

FROM centos:centos7.3.1611

ENV GOROOT /usr/local
ENV GOPATH /usr/gopath
ENV PATH /usr/local/go/bin:$PATH

RUN yum install rpm-build which where rsync gcc gcc-c++ automake autoconf libtool make -y \
    && curl -L https://studygolang.com/dl/golang/go1.12.9.linux-amd64.tar.gz | tar zxvf - -C /usr/local

Compile kubelet inside the container.

$ docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash
$ cd /usr/local/gopath/src/k8s.io/kubernetes
$ GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS="-tags=nokmem"

Replace the original runc and kubelet binaries.

Backup existing binaries.

$ mv /usr/bin/kubelet /home/kubelet
$ mv /usr/bin/docker-runc /home/docker-runc

Stop Docker and kubelet services.

$ systemctl stop docker
$ systemctl stop kubelet

Copy the newly built binaries.

$ cp kubelet /usr/bin/kubelet
$ cp kubelet /usr/local/bin/kubelet
$ cp runc /usr/bin/docker-runc

Verify kmem is disabled (restart pods or reboot, then the usage should be 0).

$ cat /sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytes

Check that no memory leak remains.

$ cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo

2 Problem: Kubernetes Certificate Expiration

Background

In a long‑running test cluster, developers encountered API access failures and discovered that the cluster certificates had expired.

Symptoms

kubectl commands return:

$ Unable to connect to the server: x509: certificate has expired or is not yet valid

Checking expiration with:

$ kubeadm alpha certs check-expiration

Solution

Using kubeadm to renew certificates works, but the default validity is one year. To obtain a ten‑year certificate, modify the controller‑manager static pod manifest to set --experimental-cluster-signing-duration=87600h and ensure the CA files point to the default ca.crt.

spec:
  containers:
  - command:
    - kube-controller-manager
    # set certificate validity to 10 years
    - --experimental-cluster-signing-duration=87600h
    - --client-ca-file=/etc/kubernetes/pki/ca.crt

Renew all pending certificates via the API:

# renew all pending certs
$ kubeadm alpha certs renew all --use-api --config kubeadm.yaml &

Because the etcd component still uses the old CA, replace its CA files, and also update the apiserver and front‑proxy CA references.

# backup static pod manifests
$ cp -r /etc/kubernetes/manifests/ /etc/kubernetes/manifests.bak
# edit etcd.yaml to use the new CA
... --peer-trusted-ca-file=/etc/kubernetes/pki/ca.crt
... --trusted-ca-file=/etc/kubernetes/pki/ca.crt
# edit kube-apiserver.yaml
... --etcd-cafile=/etc/kubernetes/pki/ca.crt
# replace front‑proxy CA
$ cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt
$ cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.key
Content originally published on Zhihu; copyright belongs to the author.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsmemory leakcertificate-renewalk8s troubleshooting
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.