How to Fix Kubernetes Memory Leaks and Expired Certificates – Step‑by‑Step Guide
This article explains why long‑running Kubernetes clusters can suffer memory‑leak errors and certificate expiration, and provides detailed, command‑line solutions including disabling kmem accounting, recompiling runc and kubelet, and extending certificate validity to ten years.
As micro‑service adoption grows, Kubernetes clusters are used more extensively, leading to operational issues. This article introduces common problems and practical solutions.
Problem 1: Fixing K8s Memory Leak
Problem Description
After a cluster runs for a long time, some nodes cannot create new pods, showing errors such as “applying cgroup … caused: mkdir … no space left on device” or “cannot allocate memory”. This indicates a memory‑leak in the cluster that worsens as more pods are created.
How to Verify
Check for a leak by inspecting
/sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo. An I/O error means no leak; otherwise the slabinfo output reveals the leak.
Solution
Disable kmem accounting in runc and kubelet. The steps include compiling a custom runc without kmem support and rebuilding kubelet.
Set up Go environment
<code>$ wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz
$ tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/
# add to bashrc
$ vim ~/.bashrc
$ export GOPATH="/data/Documents"
$ export GOROOT="/usr/local/go"
$ export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"
$ export GO111MODULE=off
$ source ~/.bashrc
$ go env
</code>Download and compile runc
<code>$ mkdir -p /data/Documents/src/github.com/opencontainers/
$ cd /data/Documents/src/github.com/opencontainers/
$ git clone https://github.com/opencontainers/runc
$ cd runc/
$ git checkout v1.0.0-rc9
</code>Compile runc
<code># install build dependencies
$ sudo yum install libseccomp-devel
$ make BUILDTAGS='seccomp nokmem'
</code>Download Kubernetes source
<code>$ mkdir -p /root/k8s/
$ cd /root/k8s/
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes/
$ git checkout v1.15.3
</code>Create Docker image for build environment
<code>FROM centos:centos7.3.1611
ENV GOROOT /usr/local/go
ENV GOPATH /usr/local/gopath
ENV PATH /usr/local/go/bin:$PATH
RUN yum install rpm-build which where rsync gcc gcc-c++ automake autoconf libtool make -y \
&& curl -L https://studygolang.com/dl/golang/go1.12.9.linux-amd64.tar.gz | tar zxvf - -C /usr/local
</code>Compile kubelet inside the image
<code>$ docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash
$ cd /usr/local/gopath/src/k8s.io/kubernetes
$ GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS="-tags=nokmem"
</code>Replace original runc and kubelet
Backup existing binaries
<code>$ mv /usr/bin/kubelet /home/kubelet
$ mv /usr/bin/docker-runc /home/docker-runc
</code>Stop Docker and kubelet
<code>$ systemctl stop docker
$ systemctl stop kubelet
</code>Copy new binaries
<code>$ cp kubelet /usr/bin/kubelet
$ cp kubelet /usr/local/bin/kubelet
$ cp runc /usr/bin/docker-runc
</code>Verify kmem is disabled by checking
/sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytes(should be 0) and ensure no leak by re‑checking
/sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo.
Problem 2: Kubernetes Certificate Expiration
Background
In a long‑running test cluster, the API became unreachable with errors like “x509: certificate has expired or is not yet valid”. The cause was an expired kubeadm‑generated certificate.
Solution
Renew certificates with kubeadm or extend their validity. To obtain a 10‑year certificate, modify the kube‑controller‑manager static pod manifest to add
--experimental-cluster-signing-duration=87600h, then restart components.
<code>spec:
containers:
- command:
- kube-controller-manager
- --experimental-cluster-signing-duration=87600h
- --client-ca-file=/etc/kubernetes/pki/ca.crt
</code>Renew all certificates via the API:
<code>$ kubeadm alpha certs renew all --use-api --config kubeadm.yaml &
</code>Update etcd and apiserver to use the new CA:
<code># backup manifests
$ cp -r /etc/kubernetes/manifests/ /etc/kubernetes/manifests.bak
# edit etcd.yaml to set --peer‑trusted‑ca‑file and --trusted‑ca‑file to /etc/kubernetes/pki/ca.crt
# edit kube‑apiserver.yaml to set --etcd‑cafile=/etc/kubernetes/pki/ca.crt
# replace front‑proxy CA
$ cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt
$ cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.key
</code>After these changes, the cluster runs with a ten‑year certificate.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.