Cloud Native 12 min read

How to Fix Common Kubernetes Memory Leaks and Certificate Expiration Issues

This article walks through diagnosing and resolving two frequent Kubernetes problems—memory‑leak errors that cause "cannot allocate memory" or "no space left on device" messages, and expired cluster certificates—by checking cgroup stats, recompiling runc and kubelet, and renewing certificates with kubeadm for long‑term validity.

Efficient Ops
Efficient Ops
Efficient Ops
How to Fix Common Kubernetes Memory Leaks and Certificate Expiration Issues

As microservice adoption grows, Kubernetes clusters are used more extensively, bringing a series of operational problems. This article introduces two common issues and provides step‑by‑step solutions.

Problem 1: "cannot allocate memory" or "no space left on device" – Kubernetes memory leak

Problem description

After a Kubernetes cluster runs for a long time, some nodes fail to create new Pods and report errors such as:

<code>applying cgroup … caused: mkdir …no space left on device</code>
<code>cannot allocate memory</code>

The cause is often a memory leak in the kmem accounting subsystem.

Detecting the leak

<code>cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo</code>

If the command returns "Input/output error", the leak is absent; otherwise, slabinfo entries indicate a leak.

Solution overview

Disable kmem accounting by recompiling

runc

and

kubelet

without kmem support, then replace the binaries.

Recompile runc

<code>wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz</code>
<code>tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/</code>
<code>export GOPATH="/data/Documents"</code>
<code>export GOROOT="/usr/local/go"</code>
<code>export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"</code>
<code>go env</code>
<code>mkdir -p /data/Documents/src/github.com/opencontainers/</code>
<code>git clone https://github.com/opencontainers/runccd runc/</code>
<code>git checkout v1.0.0-rc9</code>
<code>sudo yum install libseccomp-devel</code>
<code>make BUILDTAGS='seccomp nokmem'</code>

Recompile kubelet

<code>mkdir -p /root/k8s/</code>
<code>git clone https://github.com/kubernetes/kubernetes/ /root/k8s/kubernetes</code>
<code>git checkout v1.15.3</code>
<code># Build a Docker image with Go environment</code>
<code>FROM centos:centos7.3.1611</code>
<code>ENV GOROOT /usr/local/go</code>
<code>ENV GOPATH /usr/local/gopath</code>
<code>ENV PATH /usr/local/go/bin:$PATH</code>
<code># install build tools, then compile</code>
<code>docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash</code>
<code>cd /usr/local/gopath/src/k8s.io/kubernetes</code>
<code>GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS="-tags=nokmem"</code>

Replace binaries

<code>mv /usr/bin/kubelet /home/kubelet</code>
<code>mv /usr/bin/docker-runc /home/docker-runc</code>
<code>systemctl stop docker</code>
<code>systemctl stop kubelet</code>
<code>cp kubelet /usr/bin/kubelet</code>
<code>cp kubelet /usr/local/bin/kubelet</code>
<code>cp runc /usr/bin/docker-runc</code>
<code>cat /sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytes</code>
<code>cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo</code>

Problem 2: Kubernetes certificate expiration

Background

The API becomes inaccessible with the error:

<code>Unable to connect to the server: x509: certificate has expired or is not yet valid</code>

Check expiration using:

<code>kubeadm alpha certs check-expiration</code>

Solution

Renew all certificates and restart components:

<code>kubeadm alpha certs renew all --config=kubeadm.yaml</code>
<code>systemctl restart kubelet</code>
<code>kubeadm init phase kubeconfig all --config kubeadm.yaml</code>

For a long‑lived (10‑year) certificate, edit the

kube-controller-manager

manifest to add:

<code>spec:</code>
<code>  containers:</code>
<code>  - command:</code>
<code>    - kube-controller-manager</code>
<code>    - --experimental-cluster-signing-duration=87600h</code>
<code>    - --client-ca-file=/etc/kubernetes/pki/ca.crt</code>

Approve pending CSRs, then replace the etcd CA files:

<code>cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/etcd/ca.crt</code>
<code>cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt</code>
<code>cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.key</code>

After these changes, the cluster runs with a ten‑year certificate without needing frequent renewals.

cloud nativeKubernetesMemory Leakkubeletrunckubeadmcertificate renewal
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.