Cloud Native 16 min read

Why Kubernetes Feels Too Complex and How Alibaba Tackles It

In an interview, Alibaba technical expert Sun Jianbo explains the inherent complexity of Kubernetes, the challenges of managing stateful applications and Operators, and how Alibaba’s large‑scale practice, a four‑layer delivery model, and the Open Application Model (OAM) provide concrete solutions for cloud‑native application management.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Why Kubernetes Feels Too Complex and How Alibaba Tackles It

Kubernetes Complexity and Platform Role

Kubernetes is intentionally designed as a platform for platforms . Its primary users are platform builders (infrastructure or PaaS engineers), not end‑application developers or operators. When developers or ops teams interact directly with low‑level API objects (e.g., Deployment, StatefulSet), the system feels as complex as a Java developer having to invoke Linux kernel syscalls. This mismatch creates a steep learning curve and frequent complaints that “Kubernetes is too complex”.

Operator Model and Heavy‑Client Design

Operators are advanced Kubernetes clients that encapsulate domain‑specific logic for stateful workloads. The API server follows a “heavy‑client” model: the client must manage reflectors, caches, and informers. Consequently, Operator developers must understand both the target application and intricate Kubernetes internals, which is undesirable. The ideal situation is that domain experts (e.g., TiDB engineers) write Operators without needing deep Kubernetes expertise, suggesting a need for a higher‑level Operator framework.

Four‑Layer Application Delivery Model

Alibaba proposes a four‑layer model to structure cloud‑native application delivery:

Application Definition – declarative description of intent (e.g., storage size, replica count) independent of the underlying cluster.

Application Delivery – pipelines that translate definitions into concrete Kubernetes resources and perform continuous delivery.

Application Operations & Automation – runtime management, scaling, health‑checking, and automated policies.

Platform Layer – the underlying Kubernetes cluster and its core control plane.

This separation enables each stakeholder to focus on its responsibility while collaborating across layers.

Large‑Scale Kubernetes Practice at Alibaba

Alibaba operates dozens of clusters, the largest exceeding 10,000 nodes and serving tens of thousands of applications (including the 618 and Double‑11 shopping events). The migration path from a home‑grown LXC‑based container system (started in 2011) to Kubernetes involved three concrete steps:

Containerize existing workloads using Kubernetes‑native container patterns.

Adopt an application‑definition model such as the Open Application Model (OAM) or Helm to express high‑level intents.

Build a full delivery pipeline that integrates continuous integration/continuous delivery (CI/CD) tooling and can interoperate with development, operations, and PaaS layers.

These steps allow gradual replacement of legacy infrastructure while preserving service continuity.

Open Application Model (OAM) and Decoupling of Development & Operations

OAM provides a standard, declarative schema for describing an application’s desired state without embedding cluster‑specific details. Developers declare intents (e.g., storage: 5Gi) while operators supply the concrete implementation (e.g., volume mounts, storage class). This separation reduces the learning burden on developers and prevents “authoritarian” control by ops teams.

By abstracting the application definition from the underlying cluster, OAM enables:

Reusable component libraries that can be consumed across multiple clusters.

Clear ownership of fields that appear in all‑in‑one API objects (e.g., the replicas field in Deployment)—developers can express desired replica counts, while autoscalers or ops policies can adjust them without conflict.

Standardized hand‑off between development and operations, facilitating automated delivery pipelines.

All‑in‑One API Design and Field Ownership

Kubernetes API objects combine concerns of developers, operators, and platform services. For example, a Deployment spec contains fields relevant to code rollout, resource limits, and scaling policies. When multiple actors need to modify the same field (e.g., replicas), ownership ambiguity arises. Alibaba’s model recommends explicit ownership contracts and automation to resolve such conflicts.

Ecosystem Layering Beyond the Core Cluster

The broader cloud‑native ecosystem can be visualized as stacked layers:

Layer 1 – Application Definition : Helm, Kustomize, CNAB, etc.

Layer 1.5 – Packaging & Distribution : Tools that bundle Helm charts or OCI images.

Layer 2 – Application Delivery : Tekton, Flagger, Keptn, etc., which orchestrate CI/CD and progressive delivery.

Layer 3 – Workload Controllers : Operators, Deployment, StatefulSet, which encode domain‑specific logic.

Layer 4 – Core Kubernetes : Scheduler, kubelet, and the API server that manage containers and provide the underlying primitives.

This hierarchy clarifies where new capabilities (e.g., a higher‑level Operator framework) should be introduced.

Operational Challenges at Scale

When scaling to thousands of nodes and tens of thousands of applications, Alibaba encountered practical problems beyond raw performance:

Coordinating dozens of teams that each contribute custom Controllers.

Handling ~10,000 daily production deployments with heterogeneous release and scaling policies.

Integrating dozens of higher‑level platforms (e.g., mixed‑workload schedulers, custom PaaS) while maintaining high resource utilization.

To address these, Alibaba standardizes on OAM for application description, builds unified delivery pipelines, and develops internal plugins that expose platform capabilities as reusable components.

Future Direction

The next focus is extending OAM‑based management to cloud‑native ISVs and software distributors, making Kubernetes‑centric application delivery the default model for the broader cloud era.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesOperatorOAM
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.