How to Fix Kubernetes Memory Leaks and Expired Certificates: Step‑by‑Step Guide
This article walks through diagnosing and resolving two common Kubernetes issues—node memory leaks caused by kmem accounting and expired cluster certificates—by showing how to detect the problems, rebuild runc and kubelet, and extend certificate validity using kubeadm and manifest edits.
Problem 1: Kubernetes Memory Leak
When a Kubernetes cluster runs for a long time, some nodes may stop creating new Pods and report errors such as
applying cgroup … caused: mkdir … no space left on deviceor cannot allocate memory. This indicates a memory leak in the cgroup subsystem.
To verify the leak, inspect /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo. If the file returns an I/O error, the leak is absent; otherwise, slabinfo entries will be present, confirming the leak.
Solution Overview
The leak originates from kmem accounting, which creates cgroup entries that are not reclaimed after deletion, eventually exhausting the limit of 65535 entries. The fix is to rebuild runc and kubelet without kmem accounting.
Step‑by‑Step Implementation
Prepare Go environment
wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz
tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/
# Add to bashrc
export GOPATH="/data/Documents"
export GOROOT="/usr/local/go"
export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"
export GO111MODULE=off
source ~/.bashrc
go envClone and compile runc
mkdir -p /data/Documents/src/github.com/opencontainers/
cd /data/Documents/src/github.com/opencontainers/
git clone https://github.com/opencontainers/runc
cd runc
git checkout v1.0.0-rc9
sudo yum install libseccomp-devel
make BUILDTAGS='seccomp nokmem'
# The resulting binary is the new runc executableClone and compile kubelet
mkdir -p /root/k8s/
cd /root/k8s/
git clone https://github.com/kubernetes/kubernetes
cd kubernetes
git checkout v1.15.3
# Build a Docker image with Go environment
cat > Dockerfile <<'EOF'
FROM centos:7.3.1611
ENV GOROOT /usr/local/go
ENV GOPATH /usr/local/gopath
ENV PATH /usr/local/go/bin:$PATH
RUN yum install -y rpm-build which where rsync gcc gcc-c++ automake autoconf libtool make \
&& curl -L https://studygolang.com/dl/golang/go1.12.9.linux-amd64.tar.gz | tar zxvf - -C /usr/local
EOF
docker build -t build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 .
# Compile inside container
docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash -c "GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS='-tags=nokmem'"Replace binaries
# Backup existing binaries
mv /usr/bin/kubelet /home/kubelet
mv /usr/bin/docker-runc /home/docker-runc
# Stop services
systemctl stop docker
systemctl stop kubelet
# Copy new binaries
cp kubelet /usr/bin/kubelet
cp kubelet /usr/local/bin/kubelet
cp runc /usr/bin/docker-runcVerify the fix
cat /sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytes # should be 0
cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo # should show no leakProblem 2: Kubernetes Certificate Expiration
In a long‑running cluster, the API server may become unreachable with the error
Unable to connect to the server: x509: certificate has expired or is not yet valid. The cause is expired control‑plane certificates.
Detection
kubeadm alpha certs check-expirationThe command confirms that certificates have expired.
Renewal Using kubeadm
# Renew all certificates
kubeadm alpha certs renew all --config=kubeadm.yaml
systemctl restart kubelet
# Regenerate kubeconfig files
kubeadm init phase kubeconfig all --config kubeadm.yamlAfter renewal, replace the static pod manifests for the control‑plane components and restart them.
Extending Certificate Validity to 10 Years
To avoid frequent renewals, add the experimental flag to the controller‑manager manifest:
spec:
containers:
- command:
- kube-controller-manager
- --experimental-cluster-signing-duration=87600h # 10 years
- --client-ca-file=/etc/kubernetes/pki/ca.crtThe controller‑manager will restart automatically and use the new duration.
Updating Dependent Components
Because etcd and the API server reference the CA certificate, update their manifests accordingly.
# Update etcd manifest
cp -r /etc/kubernetes/manifests/ /etc/kubernetes/manifests.bak
vi /etc/kubernetes/manifests/etcd.yaml
# Change to use the default CA
- --peer-trusted-ca-file=/etc/kubernetes/pki/ca.crt
- --trusted-ca-file=/etc/kubernetes/pki/ca.crt
# Mount the CA directory into etcd
- mountPath: /etc/kubernetes/pki
name: etcd-certs # Update kube‑apiserver manifest
vi /etc/kubernetes/manifests/kube-apiserver.yaml
- --etcd-cafile=/etc/kubernetes/pki/ca.crtReplace the front‑proxy CA files so that aggregated APIs (e.g., metrics‑server) continue to work:
cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt
cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.keyAfter these changes, the cluster runs with a ten‑year certificate validity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
