How Knative and ECI Virtual Nodes Supercharged AI Model Deployment on Alibaba Cloud
Shuhe Technology leveraged Alibaba Cloud Container Service with Knative and ECI virtual nodes to achieve auto‑scaling, multi‑version management, and up to 60% cost reduction for AI model services, dramatically improving resource efficiency and stability under burst traffic.
Background
Risk‑management workflows in retail finance require batch processing of large user datasets to adjust credit limits, pricing, and loan terms. Model inference must be fast, accurate, and near‑real‑time.
Challenges
Underlying compute resources could not scale automatically with request volume, leading to idle capacity and high maintenance cost.
Proliferation of model services made version management and operational overhead difficult.
Solution Architecture
Adopted a serverless platform built on Knative running on Alibaba Cloud Container Service (ACK). Key components:
Knative provides request‑driven auto‑scaling, pod scale‑to‑zero, and built‑in multi‑version traffic splitting.
ACK supplies a managed Kubernetes cluster with pre‑installed Knative operators, enabling one‑click deployment of Knative services.
Elastic Compute Instance (ECI) virtual nodes are attached to the ACK node pool. When Knative scales out, pods are scheduled onto ECI nodes, providing on‑demand burst capacity while keeping the baseline node pool idle.
Deployment Steps
Provision an ACK cluster (Kubernetes 1.24+ recommended).
Enable the Knative add‑on in the ACK console; the control plane (serving, eventing) is installed automatically.
Create an ECI node pool and bind it to the cluster as a virtual node resource.
Package each model as a container image and push to a registry (e.g., Alibaba Cloud Container Registry).
Define a Service manifest that specifies the image, traffic split for versions, and resource limits.
Apply the manifest with kubectl apply -f service.yaml. Knative creates Revision objects and automatically scales pods from zero to the required count.
Results
All new models are deployed as Knative services on ACK; multi‑version support enables gray‑release and parallel execution.
Automatic scaling aligns pod count with request volume, eliminating idle resources.
Cost analysis shows approximately 60 % reduction in compute spend, especially during off‑peak periods.
Monitoring dashboards confirm pod count tracks request traffic closely, indicating high resource‑usage efficiency.
Technical Context
Serverless abstracts server management; developers deliver containerized code and the platform handles provisioning. Knative, an open‑source serverless framework on Kubernetes, standardizes container orchestration, traffic routing, and scaling. Since its 1.0 release (Nov 2021) and CNCF graduation (Mar 2022), Alibaba Cloud has integrated Knative into ACK, offering production‑grade capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
