Cloud Native 7 min read

How Knative and ECI Virtual Nodes Supercharged AI Model Deployment on Alibaba Cloud

Shuhe Technology leveraged Alibaba Cloud Container Service with Knative and ECI virtual nodes to achieve auto‑scaling, multi‑version management, and up to 60% cost reduction for AI model services, dramatically improving resource efficiency and stability under burst traffic.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Knative and ECI Virtual Nodes Supercharged AI Model Deployment on Alibaba Cloud

Background

Risk‑management workflows in retail finance require batch processing of large user datasets to adjust credit limits, pricing, and loan terms. Model inference must be fast, accurate, and near‑real‑time.

Challenges

Underlying compute resources could not scale automatically with request volume, leading to idle capacity and high maintenance cost.

Proliferation of model services made version management and operational overhead difficult.

Solution Architecture

Adopted a serverless platform built on Knative running on Alibaba Cloud Container Service (ACK). Key components:

Knative provides request‑driven auto‑scaling, pod scale‑to‑zero, and built‑in multi‑version traffic splitting.

ACK supplies a managed Kubernetes cluster with pre‑installed Knative operators, enabling one‑click deployment of Knative services.

Elastic Compute Instance (ECI) virtual nodes are attached to the ACK node pool. When Knative scales out, pods are scheduled onto ECI nodes, providing on‑demand burst capacity while keeping the baseline node pool idle.

Deployment Steps

Provision an ACK cluster (Kubernetes 1.24+ recommended).

Enable the Knative add‑on in the ACK console; the control plane (serving, eventing) is installed automatically.

Create an ECI node pool and bind it to the cluster as a virtual node resource.

Package each model as a container image and push to a registry (e.g., Alibaba Cloud Container Registry).

Define a Service manifest that specifies the image, traffic split for versions, and resource limits.

Apply the manifest with kubectl apply -f service.yaml. Knative creates Revision objects and automatically scales pods from zero to the required count.

Results

All new models are deployed as Knative services on ACK; multi‑version support enables gray‑release and parallel execution.

Automatic scaling aligns pod count with request volume, eliminating idle resources.

Cost analysis shows approximately 60 % reduction in compute spend, especially during off‑peak periods.

Monitoring dashboards confirm pod count tracks request traffic closely, indicating high resource‑usage efficiency.

Technical Context

Serverless abstracts server management; developers deliver containerized code and the platform handles provisioning. Knative, an open‑source serverless framework on Kubernetes, standardizes container orchestration, traffic routing, and scaling. Since its 1.0 release (Nov 2021) and CNCF graduation (Mar 2022), Alibaba Cloud has integrated Knative into ACK, offering production‑grade capabilities.

Pod count vs request volume
Pod count vs request volume
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeServerlessResource OptimizationAlibaba CloudKnativeAI Model Deployment
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.