Why Kubernetes Leads Container Orchestration: Architecture and Core Components Explained
This article introduces Kubernetes as the leading open‑source container orchestration platform, outlines its primary features, explains its architecture with a diagram, and provides detailed descriptions of core components such as etcd, the API server, scheduler, controller manager, kubelet, and kube‑proxy, including their roles, mechanisms, and common configurations.
Kubernetes Overview
Kubernetes is Google’s open‑source container cluster management system, derived from the large‑scale Borg project. It provides a unified platform for deploying, maintaining, and rolling‑updating containerized applications.
Container‑based application deployment, maintenance and rolling upgrades
Load balancing and service discovery
Cross‑machine and cross‑region cluster scheduling
Automatic scaling
Support for both stateless and stateful workloads
Extensive volume support
Pluggable architecture for extensibility
Architecture and Core Components
Core Components
etcd – distributed key‑value store that holds the entire cluster state
kube‑apiserver – the sole entry point for all resource operations; handles authentication, authorization, admission, API registration and discovery; writes directly to etcd
kube‑controller‑manager – runs a set of controllers that continuously reconcile the desired state with the actual state (e.g., DeploymentController, DaemonSetController, NodeController, ServiceController, etc.)
kube‑scheduler – watches for unscheduled Pods and assigns them to Nodes based on scheduling policies
kubelet – daemon on each Node that registers the Node with the API server, creates and manages Pods and their containers, and reports node resource usage
container runtime – executes containers; Docker is the default runtime in many installations
kube‑proxy – runs on every Node and provides internal service discovery and load balancing
Recommended add‑ons (plugins)
Helm – package manager for Kubernetes
CoreDNS/kube‑dns – DNS service for the cluster
Ingress controller – external entry point for Services
Heapster – resource monitoring (deprecated in newer releases)
Dashboard – web UI for cluster inspection
Federation – multi‑cluster coordination
Fluentd‑Elasticsearch – log collection, storage and query
Component Details
etcd
etcd is a distributed key‑value store built on the Raft consensus algorithm. It is used for service discovery, shared configuration and strong consistency guarantees such as leader election.
Main functions:
Basic key‑value storage
Watch mechanism for change notifications
TTL (time‑to‑live) for keys, enabling automatic expiration and renewal
Atomic compare‑and‑swap (CAS) operations for distributed locks and leader election
Raft‑based leader election process:
On startup a node is a follower with an election timeout. If no heartbeat is received from a leader, the node becomes a candidate and requests votes from other nodes.
When a candidate receives votes from a majority of nodes, it becomes the leader, accepts client writes and replicates logs to followers. If a majority is not reached, the candidate backs off for a random interval (150‑300 ms) and retries.
The elected leader maintains its role by periodically sending heartbeats to followers.
Failure handling:
Leader failure – remaining nodes trigger a new election; a recovered former leader with a lower term steps down to follower.
Follower unavailability – the node can re‑join the cluster by copying the latest log from the current leader.
Multiple candidates – candidates use random back‑off before re‑election attempts to avoid split votes.
kube‑apiserver
The API server exposes a RESTful interface for all cluster operations. It performs authentication, authorization, admission control, data validation and writes state changes to etcd. All other components (kubelet, scheduler, controller‑manager, etc.) interact with the cluster exclusively through the API server.
kube‑scheduler
The scheduler watches the API server for Pods that have no Node assigned and selects a suitable Node based on scheduling policies.
nodeSelector – simple label‑based selection; the Pod is scheduled only on Nodes whose labels match the selector.
nodeAffinity – richer expression language supporting required/preferred rules and set operations.
podAffinity / podAntiAffinity – schedule Pods onto Nodes that already run (or avoid) Pods with specific characteristics.
kube‑controller‑manager
The controller manager runs a collection of controllers that continuously monitor the cluster state via the API server and act to drive the actual state toward the desired state.
Key controllers (must be enabled):
DeploymentController
DaemonSetController
NamespaceController
ReplicationController
ReplicaSetController
JobController
Controllers started by default:
NodeController
ServiceController
PVBinderController
Controllers disabled by default (can be enabled if needed):
BootstrapSignerController
TokenCleanerController
kubelet
Each Node runs a kubelet daemon (listening on port 10250). It registers the Node with the API server, receives Pod specifications, creates containers via the container runtime, mounts volumes, configures networking, and periodically reports node health and resource usage.
Node registration details:
Use the --register-node flag to enable or disable self‑registration with the API server.
If self‑registration is disabled, the user must manually create a Node object and configure the API server endpoint for the kubelet.
Upon startup, the kubelet registers the Node, then continuously sends status updates that the API server stores in etcd.
Container Health Checks
LivenessProbe – determines whether a container is healthy. If the probe fails, the kubelet kills the container and restarts it according to the Pod’s restart policy.
ReadinessProbe – determines whether a container is ready to receive traffic. Failure removes the Pod’s IP from the Service endpoints, preventing traffic from being sent to an unready container.
Pod Startup Flow
kube‑proxy
kube‑proxy runs on every Node, watches Service and Endpoint objects from the API server, and implements load balancing using one of three proxy modes.
userspace – the earliest mode. kube‑proxy listens on a user‑space port, forwards requests to iptables, which then routes them back to kube‑proxy for endpoint selection. This extra round‑trip adds noticeable latency.
iptables – implements the entire proxy logic within iptables rules, eliminating the user‑space round‑trip. When a large number of Services/Endpoints exist, the iptables rule set can become very large, causing latency when adding or removing rules.
ipvs – leverages Linux Virtual Server (LVS) for high‑performance load balancing, scaling better than iptables when many Services/Endpoints are present.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
