How Kubernetes 1.28 Improves Batch Jobs with Pod Replacement Policy and Per‑Index Backoff Limits
Kubernetes 1.28 adds two alpha features—Pod Replacement Policy and per‑index backoff limits—that let batch jobs replace terminating pods more intelligently and cap retries for each indexed pod, reducing resource waste and improving reliability for machine‑learning workloads.
Pod Replacement Policy
By default, when a pod enters a terminating state (e.g., due to preemption or eviction), Kubernetes immediately creates a replacement pod, so both pods run concurrently. This can cause problems for frameworks such as TensorFlow or JAX that allow only one pod per index to run at a time, leading to duplicate‑task errors.
How to enable and use
The feature is gated as an alpha feature. Enable it by turning on the JobPodReplacementPolicy feature gate in the cluster configuration.
After enabling, create a Job and set the podReplacementPolicy field (e.g., Failed) in the job spec:
kind: Job
metadata:
name: new
spec:
podReplacementPolicy: Failed
...When the policy is set to Failed, a replacement pod is created only after the original pod reaches the Failed phase, not while it is merely terminating. The job’s .status.terminating field reports the number of pods that are currently terminating.
kubectl get jobs/myjob -o=jsonpath='{.items[*].status.terminating}'This behavior is especially useful for external queue controllers (e.g., Kueue) that track the quota of running pods until resources are reclaimed from terminating jobs.
Per‑Index Backoff Limit
Normally, pod failures for indexed jobs count toward the global .spec.backoffLimit. If a particular index keeps failing, the whole job may be marked as failed before other indexes finish. The per‑index backoff limit lets you cap retries for each index independently.
How to enable and use
Enable the alpha feature gate JobBackoffLimitPerIndex. Then add the backoffLimitPerIndex field to the job spec:
apiVersion: batch/v1
kind: Job
metadata:
name: job-backoff-limit-per-index-execute-all
spec:
completions: 8
parallelism: 2
completionMode: Indexed
backoffLimitPerIndex: 1
template:
spec:
restartPolicy: Never
containers:
- name: example
image: python
command:
- python3
- -c
- |
import os, sys, time
id = int(os.environ.get("JOB_COMPLETION_INDEX"))
if id == 1 or id == 2:
sys.exit(1)
time.sleep(1)This job runs eight indexed completions with a parallelism of two. Indexes 1 and 2 deliberately fail once; because backoffLimitPerIndex is set to 1, they are not retried a second time.
After the job finishes, you can list the pods to see which indexes succeeded or failed:
kubectl get pods -l job-name=job-backoff-limit-per-index-execute-allTypical output shows a mix of Completed and Error pods. To view the job’s overall status, run:
kubectl get jobs job-backoff-limit-per-index-fail-index -o yamlThe YAML includes fields such as completedIndexes, failedIndexes, succeeded, and failed. With the per‑index limit enabled, each failing index stops after its allowed retry, preventing the whole job from being marked failed due to a single problematic index.
Further Reading
For complete details, consult the official Kubernetes documentation and the corresponding KEPs (Kubernetes Enhancement Proposals) for Pod Replacement Policy, Per‑Index Backoff Limit, and Pod Failure Policy.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
