Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK
This article explains how to build robust, highly available cloud‑native applications on Alibaba Cloud Container Service for Kubernetes (ACK) by covering architecture principles, multi‑zone cluster design, Kubernetes HA features such as topology spread constraints and pod anti‑affinity, storage strategies, load‑balancing, virtual nodes, health probes, monitoring, and multi‑cluster deployment patterns.
Introduction
With the rapid growth of cloud‑native technologies, ensuring high availability (HA) for applications has become critical for enterprise services in terms of reliability, stability, and security. Alibaba Cloud Container Service for Kubernetes (ACK) provides the foundation for building HA architectures.
Application HA Design Principles
Designing a HA architecture for cloud‑native applications should consider the following aspects:
Cluster design : Deploy control‑plane and data‑plane components across multiple nodes and zones. ACK Pro offers multi‑zone control‑plane HA with SLA 99.95% (≥3 zones) or 99.50% (≤2 zones).
Container design : Use Deployments, StatefulSets, or OpenKruise CRDs to run multiple replicas and configure auto‑scaling policies.
Resource scheduling : Leverage Kubernetes scheduler, node/zone affinity, and topology spread constraints to distribute Pods across nodes, zones, and topology domains.
Storage design : Attach persistent volumes (PV/PVC) to avoid data loss; use StatefulSets for stateful workloads.
Failure recovery : Enable liveness probes and automatic restart/re‑scheduling.
Network design : Expose services via Service and Ingress.
Monitoring & alerting : Use Prometheus, Thanos, Alertmanager, etc., to detect and react to failures.
Full‑stack HA : Ensure every component—from infrastructure to application code—has redundancy and fault‑tolerance.
Kubernetes HA Techniques and ACK Implementations
Multi‑zone Control‑plane and Data‑plane
Kubernetes clusters achieve HA by deploying control‑plane components (etcd, kube‑apiserver, controller‑manager, scheduler) and data‑plane nodes in different availability zones (AZs). ACK automates this deployment and provides SLA guarantees.
Topology Spread Constraints
TopologySpreadConstraints ensure Pods are evenly spread across topology domains (e.g., zones). Key fields:
maxSkew : Maximum allowed difference in Pod count between domains.
topologyKey : Label that identifies the domain (e.g., topology.kubernetes.io/zone).
whenUnsatisfiable : Action when constraints cannot be met (e.g., DoNotSchedule).
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-run-per-zone
spec:
replicas: 3
selector:
matchLabels:
app: app-run-per-zone
template:
metadata:
labels:
app: app-run-per-zone
spec:
containers:
- name: app-container
image: app-image
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedulePod Anti‑Affinity
PodAntiAffinity prevents Pods from being scheduled on the same node, improving fault isolation. Two policies are available:
requiredDuringSchedulingIgnoredDuringExecution : Hard rule.
preferredDuringSchedulingIgnoredDuringExecution : Soft preference.
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-run-per-node
spec:
replicas: 3
selector:
matchLabels:
app: app-run-per-node
template:
metadata:
labels:
app: app-run-per-node
spec:
containers:
- name: app-container
image: app-image
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- app-run-per-node
topologyKey: "kubernetes.io/hostname"Multi‑Replica Strategies
Applications can adopt:
Active‑active (multi‑active) : All replicas receive traffic; scale via HPA.
Active‑standby (master‑slave) : One primary replica handles traffic; others standby.
Pod Disruption Budget (PDB)
PDB guarantees a minimum number of available replicas during maintenance.
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-pdb
spec:
replicas: 3
selector:
matchLabels:
app: app-with-pdb
template:
metadata:
labels:
app: app-with-pdb
spec:
containers:
- name: app-container
image: app-container-image
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: pdb-for-app
spec:
minAvailable: 2
selector:
matchLabels:
app: app-with-pdbHealth Probes & Restart Policies
Kubernetes supports three probe types to monitor container health:
Liveness probe : Restarts a container when it fails.
Readiness probe : Removes a container from service traffic until it passes.
Startup probe : Delays other probes until the container has started.
apiVersion: v1
kind: Pod
metadata:
name: app-with-probe
spec:
containers:
- name: app-container
image: app-image
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
startupProbe:
exec:
command:
- cat
- /tmp/ready
initialDelaySeconds: 20
periodSeconds: 15
restartPolicy: AlwaysStorage & Data Decoupling
Use PersistentVolume (PV) and PersistentVolumeClaim (PVC) to abstract storage. Choose appropriate storage class, capacity, and access mode based on workload requirements. Example of a topology‑aware cloud disk storage class and PVC:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: alicloud-disk-topology-essd
provisioner: diskplugin.csi.alibabacloud.com
parameters:
type: cloud_essd
fstype: ext4
zoneId: "cn-hangzhou-a,cn-hangzhou-b,cn-hangzhou-c"
performanceLevel: PL1
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: topology-disk-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: alicloud-disk-topology-essdLoad‑Balancing
Specify master and slave zones for SLB/CLB via Service annotations to keep traffic within the same zone as the node pool.
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-master-zoneid: "cn-hangzhou-a"
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-slave-zoneid: "cn-hangzhou-b"
name: nginx
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: nginx
type: LoadBalancerVirtual Nodes (Serverless)
ACK Serverless provides virtual nodes backed by Elastic Container Instance (ECI). Multi‑zone virtual nodes are configured via the eci-profile ConfigMap, allowing pod requests to be spread across vSwitches in different AZs.
kubectl -n kube-system edit cm eci-profile apiVersion: v1
kind: ConfigMap
metadata:
name: eci-profile
namespace: kube-system
data:
vswitchIds: vsw-xxx,vsw-yyy,vsw-zzz
regionId: cn-hangzhou
securitygroupId: sg-xxx
vpcId: vpc-xxxMonitoring & Alerting
Use kube‑state‑metrics, Prometheus, and Alertmanager to monitor replica health, node health per zone, and other HA metrics. Example alerts for unavailable replicas and low healthy‑node percentage:
# Alert for Deployment with unavailable replicas
- alert: SystemPodReplicasUnavailable
expr: kube_deployment_status_replicas_unavailable{namespace=~"kube-system|monitoring"} > 0
for: 1m
labels:
severity: L1
annotations:
summary: "Deployment {{ $labels.deployment }} has unavailable replicas"
# Alert when healthy node percentage in a zone drops below 80%
- alert: HealthyNodePercentagePerZoneLessThan80
expr: node_collector_zone_health <= 80
for: 5m
labels:
severity: L1
annotations:
summary: "Zone {{ $labels.zone }} healthy node percentage <= 80%"Single‑Cluster and Multi‑Cluster HA
Within a single ACK cluster, the techniques above provide HA. For higher resilience, deploy multiple clusters across zones or regions, expose services via SLB, and use DNS or Global Traffic Manager (GTM) for traffic routing. ACK One can centrally manage multi‑region clusters, offering unified observability, security, and deployment pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
