Alibaba Cloud Infrastructure
Author

Alibaba Cloud Infrastructure

For uninterrupted computing services

353
Articles
0
Likes
936
Views
0
Comments
Recent Articles

Latest from Alibaba Cloud Infrastructure

100 recent articles max
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 19, 2025 · Cloud Native

How to Pick the Best Storage for Kubernetes Workflows: Artifacts vs Volumes

This article examines the storage challenges of Kubernetes‑based Argo Workflows, comparing artifact mechanisms and native volumes, evaluating integrated versus separated compute‑storage architectures, and presenting performance‑oriented optimization techniques for object and file storage in AI and big‑data pipelines.

Argo WorkflowsArtifactsKubernetes
0 likes · 16 min read
How to Pick the Best Storage for Kubernetes Workflows: Artifacts vs Volumes
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 11, 2025 · Cloud Computing

How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks

This article examines Alibaba Cloud’s Qi Tian integrated operation‑management platform, detailing the challenges of massive cloud network management and the innovative data‑fusion, automated change, intent‑aware monitoring, and multi‑plane self‑healing technologies that enable secure, high‑performance operation at million‑device scale.

AIData Managementcloud computing
0 likes · 11 min read
How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 3, 2025 · Artificial Intelligence

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.

AIAlibaba CloudDistributed Computing
0 likes · 17 min read
Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 14, 2025 · Artificial Intelligence

How Mooncake’s KVCache Boosts Large‑Model Inference Efficiency and Cost

Mooncake, an open‑source large‑model inference platform, introduces a KVCache‑centric architecture that dramatically improves throughput, reduces latency and cuts inference costs by up to 20%, while integrating with frameworks like SGLang and vLLM and leveraging Alibaba Cloud’s eRDMA and GPUDirect technologies for scalable, high‑performance deployments.

AI performanceAlibaba CloudKVCache
0 likes · 7 min read
How Mooncake’s KVCache Boosts Large‑Model Inference Efficiency and Cost
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 12, 2025 · Cloud Native

Transform a Single‑Cluster CD Pipeline into a Multi‑Cluster System with ACK One

This guide explains how to leverage Alibaba Cloud's ACK One multi‑cluster application distribution together with the Cloud Effect DevOps platform to convert an existing single‑cluster continuous delivery pipeline into a resilient, multi‑region, multi‑cluster CD solution without modifying original YAML resources.

ACK OneCloud EffectContinuous delivery
0 likes · 9 min read
Transform a Single‑Cluster CD Pipeline into a Multi‑Cluster System with ACK One
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 1, 2025 · Artificial Intelligence

Fine-grained Profiling of Online AI Workloads on Kubernetes Using ACK AI Profiling

This article demonstrates how to use ACK AI Profiling, built on eBPF and dynamic process injection, to perform non-intrusive, low‑overhead profiling of Kubernetes‑deployed large‑language‑model inference services, identify GPU memory growth causes, and apply optimization recommendations to prevent OOM issues.

AI profilingGPU memoryKubernetes
0 likes · 10 min read
Fine-grained Profiling of Online AI Workloads on Kubernetes Using ACK AI Profiling
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 30, 2025 · Artificial Intelligence

Exploring and Practicing a Unified Compute Network for AI at Zuoyebang: Building an Innovation Engine for the AI Era

This article summarizes Zuoyebang's infrastructure leader Dong Xiaocong's presentation on the challenges of AI inference demand and supply, and describes the design and implementation of a unified compute network—including trusted networking, multi‑region container scheduling, and traffic routing—to efficiently serve large‑scale AI models.

AICompute NetworkInfrastructure
0 likes · 9 min read
Exploring and Practicing a Unified Compute Network for AI at Zuoyebang: Building an Innovation Engine for the AI Era