Essential Kubernetes Best Practices Every Engineer Should Follow
This article presents a comprehensive set of Kubernetes best practices covering namespace usage, health probes, autoscaling, resource requests, workload controllers, multi‑node clusters, RBAC, managed services, version upgrades, monitoring, GitOps, image optimization, labeling, network policies, and firewalls to help engineers design, operate, and maintain robust cloud‑native environments.
This article translates Jack Roper's "Kubernetes Best Practice" and shares practical guidance for using Kubernetes (K8s) effectively.
Best Practice Index
Use namespaces
Use readiness and liveness probes (including startup probes)
Use autoscaling
Use resource requests and limits
Deploy Pods with Deployment, DaemonSet, ReplicaSet, or StatefulSet across nodes
Use multiple nodes
Use role‑based access control (RBAC)
Host clusters externally (cloud services)
Upgrade Kubernetes versions
Monitor cluster resources and audit logs
Use version control systems
Adopt Git‑based workflows (GitOps)
Reduce container image size
Organize objects with labels
Use network policies
Use firewalls
Use Namespaces
Namespaces are crucial for organizing objects, creating logical partitions, and enhancing security. By default, a cluster includes default, kube-public, and kube-system namespaces.
RBAC can restrict access to specific namespaces, limiting the blast radius of potential errors. For example, a development team might only access the dev namespace and be barred from production. This isolation helps avoid conflicts and duplicate work.
Namespaces can also be configured with LimitRange to define standard container sizes, ResourceQuotas to cap total resource consumption, and network policies to control pod‑to‑pod traffic.
Use Readiness and Liveness Probes
Readiness and liveness probes are health‑check mechanisms. A readiness probe ensures traffic is only sent to pods that are ready to serve, preventing premature requests during startup. Liveness probes detect unresponsive applications, prompting the kubelet to restart the pod.
Since Kubernetes 1.18, a startup probe can be used for containers with long initialization times; if it fails, other probes are ignored.
Define probes for all containers within a pod.
Use Autoscaling
Autoscaling can dynamically adjust the number of pods (Horizontal Pod Autoscaler, HPA), pod resource requests (Vertical Pod Autoscaler, VPA), or cluster nodes (Cluster Autoscaler, CA) based on workload demand.
Horizontal scaling may require using PersistentVolumes for stateful data, as local storage does not survive pod recreation.
Cluster autoscaling is valuable for highly variable workloads and can reduce costs by removing idle nodes.
Use Resource Requests and Limits
Set resource requests and limits to ensure containers receive the necessary CPU and memory and to prevent a pod from exhausting cluster resources.
Without limits, pods may consume excess resources, causing other applications to suffer or nodes to crash.
If a container exceeds its memory limit, it is terminated; exceeding CPU limits throttles the process.
Deploy Pods with Controllers
Never run a pod directly. Use Deployment, DaemonSet, ReplicaSet, or StatefulSet to improve fault tolerance. Anti‑affinity rules can spread pods across nodes to avoid a single point of failure.
Use Multiple Nodes
Running a single‑node cluster reduces fault tolerance. Distribute workloads across multiple nodes to increase resilience.
Use Role‑Based Access Control (RBAC)
RBAC secures the cluster by assigning permissions to users, groups, or service accounts at the namespace (Role) or cluster (ClusterRole) level. Bind roles with RoleBinding or ClusterRoleBinding.
Apply the principle of least privilege: grant only the permissions required for a role.
Host Clusters Externally (Cloud Services)
Managed cloud services like Azure AKS or AWS EKS handle the underlying infrastructure, simplifying node scaling and reducing operational overhead.
Upgrade Kubernetes Versions
New releases bring features, security patches, and bug fixes. Upgrading is essential, but verify compatibility of workloads and be aware of deprecated APIs.
Monitor Cluster Resources and Audit Logs
Monitor control‑plane components (API server, kubelet, etcd, controller‑manager, kube‑proxy, kube‑dns) using Prometheus‑compatible metrics.
Enable audit logging in the API server to record all requests; audit policies are defined in audit-policy.yaml and can be customized.
Use automated alerting and retain logs for 30‑45 days. Integrate with tools like Azure Monitor, AWS CloudWatch, Dynatrace, or Datadog.
Use Version Control Systems
Store Kubernetes manifests in a VCS to enable change audit trails, enforce review processes, and improve cluster stability.
Adopt Git‑Based Workflows (GitOps)
GitOps leverages CI/CD pipelines to automate deployments, providing auditability and a single source of truth for cluster configuration.
Reduce Container Image Size
Smaller images speed up builds, deployments, and reduce resource consumption. Use minimal base images like Alpine and remove unnecessary packages.
Smaller images also reduce the attack surface.
Organize Objects with Labels
Labels are key‑value pairs that help organize and query resources. Recommended pod labels include name, instance, version, component, part, and managed‑by.
Labels can also convey security requirements such as confidentiality and compliance.
Use Network Policies
Network policies restrict traffic between objects at the IP and port level, similar to cloud security groups. Default‑deny all traffic and then allow only required flows.
Use Firewalls
Place a firewall in front of the API server to whitelist IPs and limit exposed ports, reducing external attack vectors.
Conclusion
Following the best practices outlined in this article will help you design, operate, and maintain Kubernetes clusters successfully on your modern application journey.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
