Cloud Native 62 min read

Inside Kubernetes: What Happens When You Run `kubectl run nginx`?

This article walks through the complete internal journey of a `kubectl run nginx --image=nginx --replicas=3` command, detailing how the request is validated, authenticated, authorized, processed by the API server, passed through initializers, scheduled, and finally materialized as running pods by kubelet, with code excerpts from Kubernetes v1.21.

Liangxu Linux

Aug 22, 2021

Inside Kubernetes: What Happens When You Run `kubectl run nginx`?

Overview

The article reproduces a 2019 deep‑dive on Kubernetes internals, updated to v1.21. It explains step‑by‑step what the control‑plane does when the user runs kubectl run nginx --image=nginx --replicas=3, from client‑side validation to the final pod state on a node.

1. Core Component Startup

Before any request can be processed, the main control‑plane components start:

Kube‑apiserver : builds the generic API server, registers API groups and creates the handler chain (authentication → authorization → admission).

Controller‑manager : runs the Deployment, ReplicaSet and other controllers that reconcile desired state.

Kubelet : runs on each node, watches the pods assigned to that node and drives the container runtime.

Typical call stacks (simplified) are shown in the source, e.g. the main function of kube‑apiserver creates the server, registers the WithAuthentication and WithAuthorization filters, and finally starts the HTTP listener.

2. kubectl Command Flow

The client performs several phases before an HTTP request reaches the API server:

Argument validation : checks required flags (e.g. image name) and aborts early on obvious errors.

Generator execution : the BasicPod generator creates a v1.Pod object from the supplied flags.

Version negotiation : kubectl discovers the appropriate API group/version (e.g. apps/v1 for Deployments) by querying /apis and caches the OpenAPI schema.

HTTP request construction : the generated object is serialized, the correct REST mapper is selected, and a POST /apis/apps/v1/namespaces/default/deployments request is built.

Client authentication : credentials are read from the kubeconfig (command‑line --kubeconfig, KUBECONFIG env var, or default ~/.kube location) and attached as TLS client certificates, Bearer tokens, or basic auth headers.

// Simplified generator call
obj, err := generator.Generate(params)
mapper := f.ToRESTMapper()
client := mapper.RESTMapping(gvk.GroupKind(), gvk.Version)
resource := client.Resource(...).Namespace(...).Create(ctx, obj, metav1.CreateOptions{})

3. kube‑apiserver Processing

When the request arrives, the apiserver runs a pipeline:

Authentication : evaluates X.509 client certificates, bearer tokens, or basic auth handlers. The first successful handler short‑circuits the chain.

Authorization : checks the request against webhook, ABAC, RBAC, or Node authorizers. A failure returns Forbidden and stops further processing.

Admission control : runs a series of plugins (e.g. NamespaceLifecycle, ResourceQuota, PodSecurityPolicy) that may mutate the object or reject it.

Persistence : the object is stored in etcd via the appropriate storage provider. The key format is <namespace>/<name>.

Initializers : if the resource has pending initializers, the object is marked as uninitialized and not yet visible to clients.

Relevant code snippets include the createHandler in pkg/apiserver/endpoints/handlers/create.go and the storage write at line 401 of the same file.

4. Initializers

Initializers are per‑resource controllers that run before an object becomes visible. They are declared in an InitializerConfiguration resource. The apiserver exposes the query parameter ?includeUninitialized so that controllers can see objects that are still waiting for their initializers.

5. Control Loops (Controllers)

After the object is persisted, a series of controllers reconcile the desired state:

Deployment controller : watches Deployments, creates a new ReplicaSet, and drives the rollout logic (recreate or rolling update).

ReplicaSet controller : ensures the number of Pods matches the replicas field, creating or deleting Pods as needed.

Informer framework : each controller uses an informer to cache API objects locally, reducing load on the apiserver.

Scheduler : watches unscheduled Pods, runs predicate filters (node name, taints, node affinity) and priority scoring plugins, then creates a Binding object that assigns spec.nodeName.

// Scheduler call stack (simplified)
Run → scheduleOne → bind → RunBindPlugins → Bind → POST /api/v1/namespaces/.../bindings

6. kubelet – Pod Sync

The kubelet on the target node continuously watches Pods whose spec.nodeName matches the node. For each pod it runs syncPod:

Generate a v1.PodStatus (phase, conditions, container statuses).

Update the status manager, which asynchronously writes back to the apiserver.

If the pod is not runnable (failed admission, deletion timestamp, etc.) it is killed.

Ensure the network is ready; if not, abort with NetworkNotReady.

Create cgroups if cgroups-per-qos is enabled.

Prepare pod data directories ( /var/run/kubelet/pods/<uid>).

Wait for volumes to attach and mount.

Pull image secrets.

Call the container runtime via the CRI SyncPod method.

func (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, podStatus *PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result PodSyncResult) {
    // 1. compute changes
    // 2. kill sandbox if changed
    // 3. create sandbox (pause container)
    // 4. start init containers
    // 5. start regular containers
}

6.1 CRI SyncPod

The kubelet talks to the container runtime (Docker, containerd, etc.) through the Container Runtime Interface (CRI) using protobuf over gRPC. SyncPod performs:

Compute sandbox and container changes.

Kill and recreate the pod sandbox if needed.

Start the pause sandbox (the pod’s network namespace holder).

Create init containers, then the regular containers.

6.2 Pause Container (Pod Sandbox)

For Docker‑based runtimes the sandbox is implemented as a special pause container that holds the pod’s network, IPC and PID namespaces. The sandbox is created via RunPodSandbox, which pulls the pause image, creates a container with a network namespace, starts the container, and invokes CNI plugins to set up networking.

func (ds *dockerService) RunPodSandbox(ctx context.Context, r *RunPodSandboxRequest) (*RunPodSandboxResponse, error) {
    // pull pause image
    // create container with network namespace
    // start container
    // invoke CNI plugins (ADD)
    return &runtimeapi.RunPodSandboxResponse{PodSandboxId: id}, nil
}

6.3 CNI Networking – Plugin Manager

The kubelet calls the CNI plugin manager to configure the pod’s network. The manager reads JSON configuration files from /etc/cni/net.d , locates the binary in /opt/cni/bin , and executes it with ADD (or DEL ) via stdin/stdout. <code>func (pm *PluginManager) SetUpPod(podNamespace, podName string, id ContainerID, annotations map[string]string, options map[string]interface{}) error { // locate plugin binary // build CNI args (CNI_COMMAND=ADD, CNI_CONTAINERID, CNI_NETNS, etc.) // exec plugin, parse result (IP, routes, DNS) return nil } </code> Example bridge plugin configuration assigns an IP from the host‑local IPAM range and creates a veth pair attached to a Linux bridge. <code>{ "cniVersion": "0.3.1", "name": "bridge", "type": "bridge", "bridge": "cni0", "isGateway": true, "ipMasq": true, "ipam": {"type": "host-local", "ranges": [[{"subnet": "10.244.0.0/16"}]]} } </code> The noop plugin is a minimal implementation that does nothing; it is useful on nodes that only run host‑network Pods. <code>func cmdAdd(args *skel.CmdArgs) error { return debugBehavior(args, "ADD") } func cmdDel(args *skel.CmdArgs) error { return debugBehavior(args, "DEL") } </code> 6.4 Init Containers and Main Containers After networking is ready, the CRI creates the actual containers: Image pull (using the secrets from spec.imagePullSecrets ). Container creation via runtimeService.CreateContainer (protobuf request). Container start via runtimeService.StartContainer . Optional post‑start hooks (exec or HTTP) are run. <code>func (m *kubeGenericRuntimeManager) startContainer(podSandboxID string, podConfig *ContainerConfig, spec *startSpec, pod *v1.Pod, ...) (string, error) { // pull image if needed // create container via CRI // start container // create legacy log symlink // run post‑start hooks return containerID, nil } </code> 7. Result When all steps succeed, the three Pods created by the original kubectl run command are scheduled onto nodes, have their pause sandbox and network set up, run any init containers, and finally start the user containers. Their status progresses from Pending → Running → Succeeded / Failed , which can be observed with kubectl get pods .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Scheduler CRI CNI kubelet Pod Lifecycle

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.