Kubernetes Dominates Containers: Architecture, Scaling & Best Practices
This comprehensive guide explains why Kubernetes has become the standard container platform, covering its application‑centric architecture, migration challenges, networking and CNI design, control‑plane components, storage solutions including CSI and FlexVolume, image distribution strategies, upgrade and backup procedures, operator patterns, and practical containerization best practices for large‑scale deployments.
1. Architectural Advantages of Kubernetes
Kubernetes is designed from an application perspective rather than an operations viewpoint. It abstracts microservice lifecycle with resources such as Deployment for stateless workloads and StatefulSet for stateful workloads, providing built‑in scaling, rolling updates, and self‑healing.
The API Server acts as a gateway, storing all cluster state in etcd and exposing a unified authentication and access interface. Most components are stateless and rely on etcd for coordination, enabling horizontal scalability.
2. Migration Compatibility Design
When moving existing applications to containers, network connectivity is the biggest obstacle. The article outlines a step‑by‑step migration strategy, including:
Using Service for service discovery.
Leveraging ConfigMap for configuration injection (environment variables or volumes).
Deploying a logging, monitoring, and APM agent via DaemonSet.
It also discusses the limitations of Service Mesh for advanced traffic management.
3. Container Runtime Isolation Design
Kubernetes uses cgroups to isolate CPU, memory, blkio, and network resources. Example commands for shaping bandwidth with tc and HTB queues are provided:
tc qdisc add dev eth0 root handle 1: htb default 12 tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbpsLXCFS can be used to provide container‑specific /proc views, improving isolation for CPU and memory metrics.
4. Control‑Plane Design and Optimization
The control plane consists of three main components:
API Server : Handles REST requests, caches objects with a CacherStorage, and reduces etcd load via watch caches.
Controller Manager : Runs built‑in controllers (e.g., Deployment, ReplicaSet, StatefulSet). Each controller watches resources via Informer and processes events from a work queue.
Scheduler : Selects a node for unscheduled pods using a two‑phase process (filtering then scoring). Parameters such as percentageOfNodesToScore improve performance on large clusters.
Key code snippets illustrate how the scheduler creates a cache, binds volumes, and schedules pods.
5. Data‑Plane Networking (CNI) and Ingress
Kubernetes delegates network setup to CNI plugins. The article walks through the Calico CNI cmdAdd flow:
// Parse config and determine node name
conf := types.NetConf{}
nodeName := utils.DetermineNodename(conf)
// Retrieve or create the workload endpoint (WEP)
wepIDs, _ := utils.GetIdentifiers(args, nodeName)
// Run IPAM plugin to allocate an IP address
ipamResult, _ := ipam.ExecAdd(conf.IPAM.Type, args.StdinData)
// Create veth pair and move host end to the host namespace
veth := &netlink.Veth{LinkAttrs: netlink.LinkAttrs{Name: contVethName, MTU: d.mtu}, PeerName: hostVethName}
netlink.LinkAdd(veth)
netlink.LinkSetHardwareAddr(hostVeth, mac)
netlink.LinkSetUp(hostVeth)Ingress is typically implemented with ingress-nginx. The controller watches Ingress, Service, and Endpoint resources, generates an Nginx configuration, and reloads the proxy when changes occur.
6. Storage Architecture (CSI, FlexVolume, PV/PVC)
CSI (Container Storage Interface) standardizes storage integration. The architecture includes:
CSI Controller : Runs as a sidecar (External Provisioner/Attacher) to manage volume lifecycle via the storage vendor’s API.
CSI Node : Deployed as a DaemonSet, registers with the kubelet and handles NodePublishVolume and NodeUnpublishVolume calls.
Example CSI driver code (Ceph RBD) shows the creation of IdentityServer, ControllerServer, and NodeServer:
// NewIdentityServer initialize a identity server for rbd CSI driver
func NewIdentityServer(d *csicommon.CSIDriver) *IdentityServer {
return &IdentityServer{DefaultIdentityServer: csicommon.NewDefaultIdentityServer(d)}
}
// NewControllerServer initialize a controller server for rbd CSI driver
func NewControllerServer(d *csicommon.CSIDriver, cachePersister util.CachePersister) *ControllerServer {
return &ControllerServer{DefaultControllerServer: csicommon.NewDefaultControllerServer(d), MetadataStore: cachePersister}
}
// NewNodeServer initialize a node server for rbd CSI driver.
func NewNodeServer(d *csicommon.CSIDriver, t string, topology map[string]string) (*NodeServer, error) {
mounter := mount.New("")
return &NodeServer{DefaultNodeServer: csicommon.NewDefaultNodeServer(d, t, topology), mounter: mounter}, nil
}StorageClasses define provisioners (internal like kubernetes.io/aws-ebs or external CSI drivers) and parameters such as type, replication, and reclaim policy.
FlexVolume is a legacy exec‑based plugin model that implements operations like init, attach, mount, and unmount via shell scripts.
7. Image Registry Design and Large‑Scale Distribution
The article breaks down image components (manifest, config, layers) and the pull workflow. To accelerate distribution, it recommends:
Reducing layer count and reusing common base images.
Pre‑pulling base images on nodes.
Using a caching proxy (e.g., Nginx) or object storage backend.
Deploying multiple Harbor instances with replication.
Peer‑to‑peer distribution solutions such as Dragonfly and Uber Kraken are described, highlighting piece‑level hashing, load balancing, and fault tolerance.
8. Upgrade, Backup, and Multi‑Cluster High Availability
Kubernetes upgrades are performed with kubeadm, moving one minor version at a time. Because all cluster state resides in etcd, backing up etcd snapshots is essential:
etcdctl snapshot save /var/lib/etcd_backup/backup_$(date "+%Y%m%d%H%M%S").db \
--endpoints=$ETCD_SERVERS \
--cacert=/var/lib/etcd/cert/ca.pem \
--cert=/var/lib/etcd/cert/etcd-client.pem \
--key=/var/lib/etcd/cert/etcd-client-key.pemRestoration uses etcdctl snapshot restore with the original cluster name, token, and peer URLs. For zero‑downtime upgrades, a “total‑split‑total” multi‑cluster model is suggested: a central release platform, multiple independent Kubernetes clusters per data center, and a unified service‑mesh layer for traffic routing.
9. Cross‑AZ Pod Distribution Strategies
Kubernetes offers several mechanisms to spread pods across availability zones:
PodTopologySpreadConstraints – defines maxSkew, topologyKey, and whenUnsatisfiable to keep pod counts balanced.
Service Topology – uses topologyKeys (e.g., kubernetes.io/hostname, topology.kubernetes.io/zone) to prefer endpoints in the same node or zone.
nodeSelector , nodeAffinity , and anti‑affinity – label nodes and schedule pods based on required or preferred rules.
Inter‑pod affinity/anti‑affinity – co‑locate or separate pods based on labels.
Taints and Tolerations – reserve nodes for special workloads.
10. Operator Pattern and Etcd Operator Example
Operators encode operational knowledge in code. The etcd‑operator watches a custom resource ( EtcdCluster) and reconciles the desired state:
// Create and start the controller
c := controller.New(cfg)
err := c.Start()
// In the controller, watch EtcdCluster CRD events
source := cache.NewListWatchFromClient(c.Config.EtcdCRCli.EtcdV1beta2().RESTClient(),
api.EtcdClusterResourcePlural, ns, fields.Everything())
_, informer := cache.NewIndexerInformer(source, &api.EtcdCluster{}, 0,
cache.ResourceEventHandlerFuncs{AddFunc: c.onAddEtcdClus, UpdateFunc: c.onUpdateEtcdClus, DeleteFunc: c.onDeleteEtcdClus}, cache.Indexers{})
// On Add, create a seed member pod and update CR status
func (c *Controller) onAddEtcdClus(obj interface{}) {
cl := obj.(*api.EtcdCluster)
nc := cluster.New(c.makeClusterConfig(), cl)
c.clusters[getNamespacedName(cl)] = nc
}
// Reconcile loop ensures the number of running pods matches the spec and performs rolling upgrades.
func (c *Cluster) reconcile(pods []*v1.Pod) error {
if needUpgrade(pods, c.cluster.Spec) {
return c.upgradeOneMember(pickOldMember(pods, c.cluster.Spec.Version).Name)
}
return c.resize()
}The operator pattern can be applied to any stateful service to automate provisioning, scaling, and recovery.
11. Containerization Best Practices
Run a single process per container; use sidecar or init containers for auxiliary tasks.
Implement health checks (readiness/liveness) and graceful shutdown.
Structure Dockerfiles to maximize layer reuse and keep images minimal.
Remove unnecessary tools from images; scan third‑party images with tools like Clair or Trivy.
Prefer non‑privileged containers and define security contexts.
Use init containers for one‑time setup and multi‑stage builds to reduce final image size.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
