How to Build High‑Availability Kubernetes Clusters with Volcengine VKE & VCI
This guide explains how Volcengine's VKE (Kubernetes Engine) and VCI (Elastic Container Instance) enable high‑availability, multi‑AZ deployments, covering cluster creation, control‑plane distribution, virtual node configuration, inventory‑aware scheduling, and practical YAML examples for resilient cloud‑native workloads.
Over the past decade, digital transformation has driven industries such as finance, retail, manufacturing, telecom, healthcare, and automotive to rely on digital services and infrastructure, making continuous availability critical for business continuity and societal stability.
Volcengine's cloud‑native products, built on ByteDance's extensive experience, provide high‑elasticity, high‑availability, and seamless operation for large‑scale traffic events. The Volcengine Kubernetes Engine (VKE) offers a container‑centric, high‑performance managed Kubernetes service, while the Elastic Container Instance (VCI) integrates Serverless capabilities for on‑demand, pay‑as‑you‑go resource consumption.
Common concerns include improving cluster availability, leveraging multi‑AZ deployments, and ensuring rapid provisioning of Serverless elastic resources.
Building a High‑Availability VKE Cluster
Distribute the control plane across multiple Availability Zones (AZs) by selecting a high‑availability cluster and creating subnets in at least three AZs.
Deploy business Pods across these AZs using corresponding Pod subnets.
This architecture prevents single‑AZ failures from causing service interruptions.
Step‑by‑Step Configuration
Log in to the Container Service console.
Navigate to the Cluster section.
Click “Create Cluster” and configure parameters.
Select the latest Kubernetes version.
Choose control‑plane subnets in three different AZs.
Choose Pod subnets in three or more AZs (ensure sufficient IP capacity).
Optionally create node pools; VKE supports node, Serverless, and hybrid pool types.
For node pools, select subnets across three AZs and enable a balanced strategy so nodes are spread evenly.
VCI Virtual Node High‑Availability Configuration
Install the vci-virtual-kubelet component to enable virtual nodes.
Optionally add CSI, Ingress Nginx, logging, and monitoring components.
Virtual nodes use a specific taint (
vci.vke.volcengine.com/node-type=vci:NoSchedule) and a label (
node.kubernetes.io/instance-type=virtual-node). To schedule Pods on virtual nodes, add the annotation
vke.volcengine.com/burst-to-vci: enforce, which the webhook maps to node selectors and tolerations.
Existing Cluster HA Refactoring
If a cluster was created without three AZ subnets, add control‑plane subnets for the missing AZs; the API server will roll‑restart, distributing control‑plane components across AZs.
For VCI virtual nodes, configure subnets in each AZ; missing subnets appear as “Pending Pod Subnet” and can be added via the console.
General‑Purpose Compute Spec (u1)
VCI offers a “u1” spec that abstracts CPU generation differences and provides inventory‑aware scheduling based on actual resource levels, improving performance for workloads insensitive to CPU generations.
Specify the spec via the Pod annotation
vci.vke.volcengine.com/preferred-instance-family: vci.u1.
YAML Example for VCI Deployment
<code>apiVersion: apps/v1
kind: Deployment
metadata:
name: test-vci
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: test-vci
template:
metadata:
annotations:
vci.vke.volcengine.com/preferred-instance-family: vci.u1
vci.volcengine.com/tls-enable: "true"
vke.volcengine.com/burst-to-vci: enforce
labels:
app: test-vci
spec:
containers:
- name: test
image: nginx
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: test-vci
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
</code>To distribute Pods across subnets, add the annotation
vke.volcengine.com/preferred-subnet-idswith a comma‑separated list of subnet IDs.
Inventory‑Aware Scheduling
Enable the
vci-virtual-kubeletand
scheduler-plugincomponents and turn on the inventory‑aware scheduling switch. The scheduler then queries VCI resource stock in each AZ and places Pods on zones with sufficient capacity; if all zones lack stock, Pods remain pending until resources become available.
For more virtual node usage, refer to the product documentation.
In conclusion, as cloud adoption continues globally, ensuring business continuity through robust high‑availability designs remains a long‑term challenge. Volcengine’s cloud‑native team aims to provide reliable solutions that help enterprises fully leverage cloud resources.
ByteDance Cloud Native
Sharing ByteDance's cloud-native technologies, technical practices, and developer events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.