Author

Alibaba Cloud Infrastructure

For uninterrupted computing services

357

Articles

Likes

1.4k

Views

Comments

Latest from Alibaba Cloud Infrastructure

100 recent articles max

Alibaba Cloud Infrastructure

Dec 27, 2025 · Cloud Native

How to Safely Deploy AI Inference Models Across Multi‑Cluster Environments with ACK One Fleet

This article explains why AI inference services require multi‑cluster gray‑release, outlines the risks of traditional updates, and details how ACK One Fleet combined with Kruise Rollout provides a controlled, observable, and rollback‑capable solution for deploying large AI models across hybrid cloud clusters.

ACK OneAIGray Release

0 likes · 10 min read

How to Safely Deploy AI Inference Models Across Multi‑Cluster Environments with ACK One Fleet

Alibaba Cloud Infrastructure

Dec 23, 2025 · Cloud Native

How Knative Serverless Cuts AI Inference Costs in Half and Doubles Efficiency

This article explains how the cloud‑native Knative serverless framework reduces GPU waste, enables request‑driven autoscaling to zero, accelerates AI model versioning and startup with Fluid, and integrates protocols like MCP and A2A to deliver cost‑effective, high‑performance AI inference services.

AI inferenceCloud NativeGPU

0 likes · 17 min read

How Knative Serverless Cuts AI Inference Costs in Half and Doubles Efficiency

Alibaba Cloud Infrastructure

Dec 22, 2025 · Artificial Intelligence

Boost LLM Inference with KV‑Cache‑Aware Routing on Alibaba Cloud ACK GIE

This article explains why KV‑Cache hit rate is critical for large‑model inference, describes vLLM's automatic prefix caching, outlines the distributed cache challenges, and provides a step‑by‑step guide to deploying Alibaba Cloud ACK Gateway with Inference Extension's precise‑mode prefix‑cache‑aware routing, backed by benchmark results.

Alibaba CloudKV cacheKubernetes

0 likes · 18 min read

Boost LLM Inference with KV‑Cache‑Aware Routing on Alibaba Cloud ACK GIE

Alibaba Cloud Infrastructure

Dec 19, 2025 · Cloud Native

How Argo Workflows Tame Unpredictable AI Agents for Scalable Production

At KubeCon NA, experts showed that combining deterministic Argo Workflows with large‑model AI agents lets teams orchestrate smart, flexible agents in a predictable, observable, and auditable way, enabling large‑scale CVE remediation and self‑healing operations on Kubernetes.

Argo WorkflowsKubernetesPlatform Engineering

0 likes · 8 min read

How Argo Workflows Tame Unpredictable AI Agents for Scalable Production

Alibaba Cloud Infrastructure

Dec 17, 2025 · Cloud Native

AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration

The article examines how the rise of large‑model AI training reintroduces the need for gang scheduling in Kubernetes, contrasting the rigid resource requirements of HPC‑style workloads with cloud‑native elasticity, and outlines the historical evolution, current implementations, and future directions for achieving more flexible, high‑throughput compute orchestration.

AI trainingCloud NativeGang Scheduling

0 likes · 22 min read

AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration

Alibaba Cloud Infrastructure

Dec 15, 2025 · Artificial Intelligence

Deploy Multi‑Agent AI Applications on Alibaba Cloud with AgentScope

This guide explains how to build, containerise, and deploy multi‑agent AI applications using the open‑source AgentScope framework on Alibaba Cloud's ACK Pro and ACS services, covering architecture, key features, step‑by‑step deployment, sandbox usage, and testing procedures.

AI agentsAgentScopeCloud Native

0 likes · 19 min read

Deploy Multi‑Agent AI Applications on Alibaba Cloud with AgentScope

Alibaba Cloud Infrastructure

Dec 9, 2025 · Cloud Native

How to Detect and Resolve Kernel Memory & CPU Latency in Kubernetes Clusters

In cloud‑native Kubernetes environments, resource over‑commit and mixed deployments can cause kernel‑level memory reclaim and CPU scheduling delays that manifest as application jitter, and this article explains how to visualize, diagnose, and remediate those delays using the SysOM exporter and related metrics.

CPU schedulingKubernetesMemory reclaim

0 likes · 13 min read

How to Detect and Resolve Kernel Memory & CPU Latency in Kubernetes Clusters

Alibaba Cloud Infrastructure

Dec 8, 2025 · Cloud Native

Optimizing AI GPU Utilization with Multi‑Cluster Priority Scheduling on ACK One

In the era of large AI models, ACK One’s multi‑cluster fleet provides inventory‑aware elastic scheduling, cluster‑level priority dispatch, and hybrid‑cloud strategies to maximize GPU utilization, ensure business continuity, and reduce costs across regions and on‑premise data centers.

ACK OneAI workloadCloud Native

0 likes · 11 min read

Optimizing AI GPU Utilization with Multi‑Cluster Priority Scheduling on ACK One

Alibaba Cloud Infrastructure

Dec 5, 2025 · Information Security

How to Verify Cross‑Cloud SLSA Attestations for Secure Kubernetes Deployments

This article explains how to strengthen Kubernetes supply‑chain security by using SLSA Source Track, the Notary Project’s Ratify tool, and policy engines like Gatekeeper to automatically generate, attach, and verify attestation proofs for OCI images before they are deployed to production clusters.

AttestationCI/CDGatekeeper

0 likes · 18 min read

How to Verify Cross‑Cloud SLSA Attestations for Secure Kubernetes Deployments

Alibaba Cloud Infrastructure

Nov 25, 2025 · Operations

How to Uncover Hidden Java Memory Leaks in Kubernetes Pods

This article explains why Java applications in cloud containers often encounter OOMKilled pods, details the hidden memory consumption from JNI, libc, and Transparent Huge Pages, and demonstrates step‑by‑step how to use Alibaba Cloud OS Console's memory panorama analysis to identify and mitigate the root causes.

JNIKubernetesMemory Leak

0 likes · 11 min read

How to Uncover Hidden Java Memory Leaks in Kubernetes Pods