Cloud Native 11 min read

How OpenKruise Accelerates Large‑Scale Cluster Deployments with Image Pre‑warming

This article explains why image pre‑warming is essential for large Kubernetes clusters, describes OpenKruise’s architecture and custom resources (NodeImage, ImagePullJob) that enable cluster‑wide, sidecar, and resource‑pool image pre‑pulling, and outlines upcoming enhancements that combine pre‑warming with in‑place upgrades.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How OpenKruise Accelerates Large‑Scale Cluster Deployments with Image Pre‑warming

OpenKruise, an open‑source cloud‑native application automation suite under the CNCF Sandbox, adds an image pre‑warming capability in version v0.8.0 to address the long pull‑time of container images in large‑scale clusters.

Why Image Pre‑warming Is Needed

Although Docker images simplify container deployment, pulling large images can dominate pod creation time, especially when scaling thousands of nodes. In elastic clusters, the pull latency can exceed 70 % of the total startup time, breaking expectations of instant scaling and rapid releases.

OpenKruise Architecture for Pre‑warming

After installing OpenKruise, two components run in the kruise-system namespace:

kruise-manager : a Deployment‑based controller that watches custom resources.

kruise-daemon : a DaemonSet deployed on every node that interacts with the CRI to perform extended actions such as image pulling.

For each node, OpenKruise creates a NodeImage custom resource that lists the images to pre‑pull. The daemon on that node reads the NodeImage and executes the pull tasks according to the specified policies (timeout, retry, TTL, etc.).

Higher‑Level Custom Resource: ImagePullJob

Managing NodeImage objects individually does not scale to thousands of nodes. OpenKruise therefore provides ImagePullJob, which lets users declare a target image, a selector for matching nodes, and pull policies. The imagepulljob‑controller expands the job into NodeImage objects for all matching nodes.

Typical Usage Scenarios

1. Cluster‑wide Base Image Pre‑warming

apiVersion: apps.kruise.io/v1alpha1
kind: ImagePullJob
metadata:
  name: base-image-job
spec:
  image: xxx/base-image:latest
  parallelism: 10
  completionPolicy:
    type: Never
  pullPolicy:
    backoffLimit: 3
    timeoutSeconds: 300

This job, without a selector, targets all nodes, runs continuously (policy Never), and retries every ~24 hours to keep the image cached.

2. Sidecar Image Pre‑warming

apiVersion: apps.kruise.io/v1alpha1
kind: ImagePullJob
metadata:
  name: sidecar-image-job
spec:
  image: xxx/sidecar-image:latest
  parallelism: 20
  completionPolicy:
    type: Always
    activeDeadlineSeconds: 1800
    ttlSecondsAfterFinished: 300
  pullPolicy:
    backoffLimit: 3
    timeoutSeconds: 300

Sidecar images are also pre‑pulled cluster‑wide; the Always policy performs a one‑time pull with a 30‑minute timeout and automatic cleanup after 5 minutes.

3. Resource‑Pool Specific Image Pre‑warming

apiVersion: apps.kruise.io/v1alpha1
kind: ImagePullJob
metadata:
  name: serverless-job
spec:
  image: xxx/serverless-image:latest
  parallelism: 10
  completionPolicy:
    type: Never
  pullPolicy:
    backoffLimit: 3
    timeoutSeconds: 300
  selector:
    matchLabels:
      resource-pool: serverless

This job limits pre‑warming to nodes labeled resource-pool=serverless, using a long‑running Never policy.

Future Direction: Combining Pre‑warming with In‑place Upgrade

In the upcoming v0.9.0, OpenKruise’s CloneSet will automatically pre‑warm the new image on target nodes while the first batch of pods is being gray‑scaled. This “publish + pre‑warm” flow eliminates image pull latency for subsequent batches, a benefit that only applies to OpenKruise’s in‑place upgrade feature.

In‑place upgrade itself avoids pod deletion and recreation, saving scheduling, network, and storage allocation time, and dramatically reducing image pull overhead because only the new layers need to be downloaded.

For readers interested in deeper details of OpenKruise’s in‑place upgrade, see the referenced article “揭秘:如何为 Kubernetes 实现原地升级?” (link retained as a technical reference).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CloudNativeKubernetesOpenKruiseContainerImagesImagePrewarmingInPlaceUpgrade
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.