How to Fix Kubernetes Memory Leaks and Expired Certificates: Step‑by‑Step Guide
This article explains common Kubernetes issues such as node memory leaks and certificate expiration, provides diagnostic commands, and offers detailed solutions including disabling kmem accounting, recompiling runc and kubelet, and extending certificate validity to ten years.
1 Problem: Fix K8s Memory Leak
Problem description
After long‑running clusters, some nodes cannot create new pods and show errors like "applying cgroup … caused: mkdir …no space left on device" or "cannot allocate memory" during pod description.
applying cgroup … caused: mkdir …no space left on deviceThis indicates a possible memory‑leak in the cluster; the more pods are created, the faster memory is exhausted.
Check for memory leak.
$ cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfoIf the command returns an I/O error, no leak is present; otherwise a slabinfo output similar to the example below appears.
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>Solution
Disable kmem accounting in runc and kubelet (kernel upgrade is more invasive).
Reason: each cgroup subsystem entry is limited (kernel/cgroup.c #L139). Enabling kmem accounting creates entries that are not reclaimed after the cgroup directory is removed, eventually exhausting the 65535 entry limit.
To resolve, recompile runc and kubelet without kmem support.
2.1 Compile runc
Configure Go environment.
$ wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz
$ tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/
# add to bashrc
$ vim ~/.bashrc
$ export GOPATH="/data/Documents"
$ export GOROOT="/usr/local/go"
$ export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"
$ export GO111MODULE=off
$ source ~/.bashrc
$ go envDownload runc source.
$ mkdir -p /data/Documents/src/github.com/opencontainers/
$ cd /data/Documents/src/github.com/opencontainers/
$ git clone https://github.com/opencontainers/runc
$ cd runc/
$ git checkout v1.0.0-rc9Compile.
# install compile components
$ sudo yum install libseccomp-devel
$ make BUILDTAGS='seccomp nokmem'2.2 Compile kubelet
Download Kubernetes source.
$ mkdir -p /root/k8s/
$ cd /root/k8s/
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes/
$ git checkout v1.15.3Create a build‑environment Docker image.
FROM centos:centos7.3.1611
ENV GOROOT /usr/local
ENV GOPATH /usr/gopath
ENV PATH /usr/local/go/bin:$PATH
RUN yum install rpm-build which where rsync gcc gcc-c++ automake autoconf libtool make -y \
&& curl -L https://studygolang.com/dl/golang/go1.12.9.linux-amd64.tar.gz | tar zxvf - -C /usr/localCompile kubelet inside the container.
$ docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash
$ cd /usr/local/gopath/src/k8s.io/kubernetes
$ GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS="-tags=nokmem"Replace the original runc and kubelet binaries.
Backup existing binaries.
$ mv /usr/bin/kubelet /home/kubelet
$ mv /usr/bin/docker-runc /home/docker-runcStop Docker and kubelet services.
$ systemctl stop docker
$ systemctl stop kubeletCopy the newly built binaries.
$ cp kubelet /usr/bin/kubelet
$ cp kubelet /usr/local/bin/kubelet
$ cp runc /usr/bin/docker-runcVerify kmem is disabled (restart pods or reboot, then the usage should be 0).
$ cat /sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytesCheck that no memory leak remains.
$ cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo2 Problem: Kubernetes Certificate Expiration
Background
In a long‑running test cluster, developers encountered API access failures and discovered that the cluster certificates had expired.
Symptoms
kubectl commands return:
$ Unable to connect to the server: x509: certificate has expired or is not yet validChecking expiration with:
$ kubeadm alpha certs check-expirationSolution
Using kubeadm to renew certificates works, but the default validity is one year. To obtain a ten‑year certificate, modify the controller‑manager static pod manifest to set --experimental-cluster-signing-duration=87600h and ensure the CA files point to the default ca.crt.
spec:
containers:
- command:
- kube-controller-manager
# set certificate validity to 10 years
- --experimental-cluster-signing-duration=87600h
- --client-ca-file=/etc/kubernetes/pki/ca.crtRenew all pending certificates via the API:
# renew all pending certs
$ kubeadm alpha certs renew all --use-api --config kubeadm.yaml &Because the etcd component still uses the old CA, replace its CA files, and also update the apiserver and front‑proxy CA references.
# backup static pod manifests
$ cp -r /etc/kubernetes/manifests/ /etc/kubernetes/manifests.bak
# edit etcd.yaml to use the new CA
... --peer-trusted-ca-file=/etc/kubernetes/pki/ca.crt
... --trusted-ca-file=/etc/kubernetes/pki/ca.crt
# edit kube-apiserver.yaml
... --etcd-cafile=/etc/kubernetes/pki/ca.crt
# replace front‑proxy CA
$ cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt
$ cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.keyContent originally published on Zhihu; copyright belongs to the author.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
