Limitations and Challenges of Kubernetes in Cluster Management and Application Scenarios
The article examines Kubernetes' widespread adoption, outlines its scalability and multi‑cluster management constraints, discusses practical application scenarios such as deployment models, batch scheduling, and hard multi‑tenancy, and highlights the gaps that still limit its use in large‑scale production environments.
Kubernetes, released in 2014, has become the de‑facto standard for container orchestration, with most developers and about 75% of production environments now using it.
Despite its popularity, Kubernetes has several limitations; understanding these risks is essential for effective use. The article analyzes these limits from cluster management and application scenario perspectives.
Cluster Management
A cluster is a set of computers that work together as a single resource pool for scheduling containers. The following discusses complex issues Kubernetes faces in managing clusters.
Horizontal Scalability
Cluster size is a key metric for evaluating resource management systems, yet Kubernetes can manage far fewer nodes than other systems. For example, an 8‑CPU, 16‑GB VM on AWS costs about $150 per month (≈¥1,000). A 5,000‑node cluster would cost roughly $8,000,000 per month (≈¥50,000,000), and a 1% improvement in utilization could save ¥500,000 monthly.
Kubernetes officially supports up to 5,000 nodes, 150,000 Pods, 300,000 containers, and 100 Pods per node, which is an order of magnitude smaller than Apache Mesos (tens of thousands of nodes) or Hadoop YARN (50,000 nodes). Even with community optimizations, scaling beyond a few thousand nodes often encounters bottlenecks in etcd, the API server, the scheduler, and controllers.
Large enterprises must limit certain Kubernetes features or add caches to the API server to achieve stable scaling; community‑driven changes are possible but require coordinated effort.
Multi‑Cluster Management
Even a massive single cluster cannot solve all enterprise problems; managing multiple clusters is essential. The SIG Multi‑Cluster group works on solutions, but challenges include resource imbalance, cross‑cluster access difficulty, and higher operational cost.
kubefed
kubefed provides a federated control plane for managing resources and networking across clusters. It creates federated objects such as FederatedDeployment that are translated into regular Deployment objects in each member cluster.
kind:
FederatedDeployment
...
spec:
...
overrides:
# Apply overrides to cluster1
-
clusterName:
cluster1
clusterOverrides:
# Set the replicas field to 5
-
path:
"/spec/replicas"
value:
5
# Set the image of the first container
-
path:
"/spec/template/spec/containers/0/image"
value:
"nginx:1.17.0-alpine"
# Ensure the annotation "foo: bar" exists
-
path:
"/metadata/annotations"
op:
"add"
value:
foo:
bar
# Ensure an annotation with key "foo" does not exist
-
path:
"/metadata/annotations/foo"
op:
"remove"
# Adds an argument `-q` at index 0 of the args list
# this will obviously shift the existing arguments, if any
-
path:
"/spec/template/spec/containers/0/args/0"
op:
"add"
value:
"-q"kubefed also supports more advanced strategies via ReplicaSchedulingPreference to distribute replicas across clusters based on weight and capacity.
apiVersion:
scheduling.kubefed.io/v1alpha1
kind:
ReplicaSchedulingPreference
metadata:
name:
test-deployment
namespace:
test-ns
spec:
targetKind:
FederatedDeployment
totalReplicas:
9
clusters:
A:
minReplicas:
4
maxReplicas:
6
weight:
1
B:
minReplicas:
4
maxReplicas:
8
weight:
2Cluster API (SIG Cluster‑Lifecycle) offers a declarative API for provisioning, updating, and operating multiple clusters, with the key resource Machine representing a node that is created, updated, or deleted by provider‑specific controllers.
Application Scenarios
The following sections discuss interesting Kubernetes application scenarios, including deployment models, batch scheduling, and hard multi‑tenancy, which are current community focus areas and also notable blind spots.
Application Distribution
Kubernetes core provides three basic workload resources: Deployment (stateless services), StatefulSet (stateful services), and DaemonSet (node‑level daemons). While they cover ~90% of cases, more complex workloads rely on CRDs and SIG Apps contributions.
Batch Scheduling
Machine‑learning, batch, and streaming workloads have never been Kubernetes' strong suit; many organizations still use Hadoop YARN for batch processing. The scheduler framework now supports advanced concepts like PodGroup for co‑scheduling, useful for Spark or TensorFlow jobs.
# PodGroup CRD spec
apiVersion:
scheduling.sigs.k8s.io/v1alpha1
kind:
PodGroup
metadata:
name:
nginx
spec:
scheduleTimeoutSeconds:
10
minMember:
3
---
# Add a label to mark the pod belongs to a group
labels:
pod-group.scheduling.sigs.k8s.io:
nginxVolcano, a native batch system for Kubernetes, supports frameworks such as TensorFlow, Spark, PyTorch, and MPI, but Kubernetes still lags behind dedicated batch systems like YARN.
Hard Multi‑Tenancy
Hard multi‑tenancy—isolating tenants so they do not affect each other—is still difficult for Kubernetes. Namespaces provide logical separation, but they cannot guarantee resource isolation for CPU, I/O, network, or cache. The community’s multi‑tenancy working group has produced limited results so far.
Kubernetes incurs high overhead for small clusters because a stable control plane requires at least three etcd nodes.
Containers share underlying hosts, leading to potential interference when CPU, memory, I/O, or network resources are not fully isolated.
Conclusion
Every technology has a lifecycle; lower‑level technologies tend to last longer. Kubernetes dominates container orchestration today, but its limitations mean that future tools may eventually replace it. Understanding both strengths and weaknesses helps practitioners use it wisely and stay prepared for the next generation of orchestration platforms.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.