Cloud Native 18 min read

Mastering Kubernetes: Essential Node & Pod Practices for Stable, Secure Deployments

This article outlines essential Kubernetes operational practices—including node maintenance, kernel upgrades, Docker and kubelet tuning, pod resource limits, scheduling strategies, health probes, logging standards, and monitoring setups—to ensure applications run reliably, securely, and efficiently in production environments.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Mastering Kubernetes: Essential Node & Pod Practices for Stable, Secure Deployments

As Kubernetes matures, more companies deploy applications on it, but containerization is only the first step; ensuring stable, secure operation is essential.

Node

A Node can be a physical or cloud host and serves as the Kubernetes carrier. Operations focus on preventing anomalies.

Key Node tasks include:

Kernel upgrade

Software updates

Docker daemon optimization

Kubelet parameter tuning

Log configuration management

Security hardening

Kernel Upgrade

CentOS 7 uses kernel 3.10, which has many known bugs in Kubernetes; upgrading to a newer kernel (e.g., 5.4) or using Ubuntu is recommended.

<code>wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-lt-5.4.86-1.el7.elrepo.x86_64.rpm
rpm -ivh kernel-lt-5.4.86-1.el7.elrepo.x86_64.rpm
cat /boot/grub2/grub.cfg | grep menuentry
grub2-set-default 'CentOS Linux (5.4.86-1.el7.elrepo.x86_64) 7 (Core)'
grub2-editenv list
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot</code>

Software Updates

Update high‑risk vulnerable packages while balancing compatibility.

Docker Daemon Optimization

<code>cat > /etc/docker/daemon.json <<EOF
{
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "100m",
        "max-file": "10"
    },
    "bip": "169.254.123.1/24",
    "oom-score-adjust": -1000,
    "registry-mirrors": ["https://pqbap4ya.mirror.aliyuncs.com"],
    "storage-driver": "overlay2",
    "storage-opts": ["overlay2.override_kernel_check=true"],
    "live-restore": true
}
EOF</code>

Kubelet Parameter Tuning

<code>cat > /etc/systemd/system/kubelet.service <<EOF
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/

[Service]
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/pids/system.slice/kubelet.service
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpu/system.slice/kubelet.service
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuacct/system.slice/kubelet.service
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/memory/system.slice/kubelet.service
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/systemd/system.slice/kubelet.service
ExecStart=/usr/bin/kubelet \
  --enforce-node-allocatable=pods,kube-reserved \
  --kube-reserved-cgroup=/system.slice/kubelet.service \
  --kube-reserved=cpu=200m,memory=250Mi \
  --eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \
  --eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \
  --eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \
  --eviction-max-pod-grace-period=30 \
  --eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF</code>

Log Configuration Management

Use

rsyslog

or OSS to forward system logs for forensic analysis.

Security Hardening

SSH password expiration policy

Password complexity policy

SSH login attempt limits

System timeout configuration

History record configuration

Pod

Pods are the smallest scheduling unit; their stability directly affects applications.

Resource Limits

Choose QoS class based on workload importance.

Guaranteed (high‑priority):

<code>resources:
  limits:
    memory: "200Mi"
    cpu: "700m"
  requests:
    memory: "200Mi"
    cpu: "700m"
</code>

Burstable (general):

<code>resources:
  limits:
    memory: "200Mi"
    cpu: "500m"
  requests:
    memory: "100Mi"
    cpu: "100m"
</code>

Avoid using BestEffort.

Scheduling Strategies

Node affinity, taints & tolerations, and pod anti‑affinity help control placement.

<code>affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - preference: {}
        weight: 100
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: env
              operator: In
              values:
                - uat
</code>
<code>tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
  tolerationSeconds: 3600
</code>
<code>affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - store
      topologyKey: "kubernetes.io/hostname"
</code>

Graceful Upgrade

Use preStop hooks to delay termination or deregister from service registry.

<code>lifecycle:
  preStop:
    exec:
      command:
      - /bin/sh
      - -c
      - sleep 15
</code>
<code>lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - "curl -X DELETE your_nacos_ip:8848/nacos/v1/ns/instance?serviceName=nacos.test.1&ip=${POD_IP}&port=8880&clusterName=DEFAULT && sleep 15"
</code>

Probes

Configure liveness, readiness, and optionally startup probes.

<code>readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /health
    port: http
    scheme: HTTP
  initialDelaySeconds: 40
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 3
livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /health
    port: http
    scheme: HTTP
  initialDelaySeconds: 60
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 2
</code>
<code>startupProbe:
  httpGet:
    path: /health
    port: 80
  failureThreshold: 10
  initialDelaySeconds: 10
  periodSeconds: 10
</code>

Protection Strategy

Use PodDisruptionBudget to limit voluntary disruptions.

<code>apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: pdb-demo
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: nginx
</code>
Note: minAvailable and maxUnavailable are mutually exclusive.

Logging

Logging spans business and exception logs; it should be simple yet informative, supporting monitoring, debugging, and minimal performance impact.

Log Standards

Use appropriate log levels

Unified output format

Consistent code encoding

Standardized log paths

Standardized naming conventions

Collection

Two main approaches:

Deploy a logging agent on the Node to collect stdout logs.

Run a sidecar container in the Pod to collect application logs.

Analysis

Effective log analysis helps pinpoint issues; services like Alibaba Cloud Log Service provide powerful analysis capabilities.

Alerting

Define precise alert keywords to avoid noise and ensure alerts indicate actionable problems.

Monitoring

Observability across cluster and applications is vital for reliability.

Cluster Monitoring

Prometheus is commonly used to monitor Kubernetes clusters; key metrics include node health, API server latency, etc.

Application Monitoring

Expose application metrics in Prometheus format; javaagent can be used to collect JVM metrics.

Event Monitoring

Monitor Warning and Normal events using tools like kube-eventer to detect abnormal state transitions.

Link Monitoring

Use tracing tools such as SkyWalking to visualize inter‑service calls and diagnose latency issues.

Alert Notification

Select unique, problem‑reflecting metrics for alerts and classify urgency to ensure timely response.

Monitoringcloud nativeKubernetesLoggingNode ManagementPod Best Practices
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.