How to Fix Common Kubernetes Memory Leaks and Certificate Expiration Issues
This article walks through diagnosing and resolving two frequent Kubernetes problems—memory‑leak errors that cause "cannot allocate memory" or "no space left on device" messages, and expired cluster certificates—by checking cgroup stats, recompiling runc and kubelet, and renewing certificates with kubeadm for long‑term validity.
As microservice adoption grows, Kubernetes clusters are used more extensively, bringing a series of operational problems. This article introduces two common issues and provides step‑by‑step solutions.
Problem 1: "cannot allocate memory" or "no space left on device" – Kubernetes memory leak
Problem description
After a Kubernetes cluster runs for a long time, some nodes fail to create new Pods and report errors such as:
applying cgroup … caused: mkdir …no space left on device</code>
<code>cannot allocate memoryThe cause is often a memory leak in the kmem accounting subsystem.
Detecting the leak
cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfoIf the command returns "Input/output error", the leak is absent; otherwise, slabinfo entries indicate a leak.
Solution overview
Disable kmem accounting by recompiling runc and kubelet without kmem support, then replace the binaries.
Recompile runc
wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz</code>
<code>tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/</code>
<code>export GOPATH="/data/Documents"</code>
<code>export GOROOT="/usr/local/go"</code>
<code>export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"</code>
<code>go env</code>
<code>mkdir -p /data/Documents/src/github.com/opencontainers/</code>
<code>git clone https://github.com/opencontainers/runccd runc/</code>
<code>git checkout v1.0.0-rc9</code>
<code>sudo yum install libseccomp-devel</code>
<code>make BUILDTAGS='seccomp nokmem'Recompile kubelet
mkdir -p /root/k8s/</code>
<code>git clone https://github.com/kubernetes/kubernetes/ /root/k8s/kubernetes</code>
<code>git checkout v1.15.3</code>
<code># Build a Docker image with Go environment</code>
<code>FROM centos:centos7.3.1611</code>
<code>ENV GOROOT /usr/local/go</code>
<code>ENV GOPATH /usr/local/gopath</code>
<code>ENV PATH /usr/local/go/bin:$PATH</code>
<code># install build tools, then compile</code>
<code>docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash</code>
<code>cd /usr/local/gopath/src/k8s.io/kubernetes</code>
<code>GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS="-tags=nokmem"Replace binaries
mv /usr/bin/kubelet /home/kubelet</code>
<code>mv /usr/bin/docker-runc /home/docker-runc</code>
<code>systemctl stop docker</code>
<code>systemctl stop kubelet</code>
<code>cp kubelet /usr/bin/kubelet</code>
<code>cp kubelet /usr/local/bin/kubelet</code>
<code>cp runc /usr/bin/docker-runc</code>
<code>cat /sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytes</code>
<code>cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfoProblem 2: Kubernetes certificate expiration
Background
The API becomes inaccessible with the error:
Unable to connect to the server: x509: certificate has expired or is not yet validCheck expiration using: kubeadm alpha certs check-expiration Solution
Renew all certificates and restart components:
kubeadm alpha certs renew all --config=kubeadm.yaml</code>
<code>systemctl restart kubelet</code>
<code>kubeadm init phase kubeconfig all --config kubeadm.yamlFor a long‑lived (10‑year) certificate, edit the kube-controller-manager manifest to add:
spec:</code>
<code> containers:</code>
<code> - command:</code>
<code> - kube-controller-manager</code>
<code> - --experimental-cluster-signing-duration=87600h</code>
<code> - --client-ca-file=/etc/kubernetes/pki/ca.crtApprove pending CSRs, then replace the etcd CA files:
cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/etcd/ca.crt</code>
<code>cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt</code>
<code>cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.keyAfter these changes, the cluster runs with a ten‑year certificate without needing frequent renewals.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
