Artificial Intelligence 9 min read

Exploring and Practicing a Unified Compute Network for AI at Zuoyebang: Building an Innovation Engine for the AI Era

This article summarizes Zuoyebang's infrastructure leader Dong Xiaocong's presentation on the challenges of AI inference demand and supply, and describes the design and implementation of a unified compute network—including trusted networking, multi‑region container scheduling, and traffic routing—to efficiently serve large‑scale AI models.

Alibaba Cloud Infrastructure

Apr 30, 2025

Exploring and Practicing a Unified Compute Network for AI at Zuoyebang: Building an Innovation Engine for the AI Era

Dong Xiaocong, head of infrastructure at Zuoyebang, presented at the Alibaba Cloud AI Power Conference, introducing the concept of a "compute network" that goes beyond traditional data‑center hardware to a logical network spanning multiple machine types and regions.

Zuoyebang, founded in 2015, offers AI‑enhanced education products and launched a large AI model in 2023 that powers features such as AI‑based problem solving and writing assistance, creating new demand for high‑performance inference resources.

The AI inference surge brings two main challenges: unpredictable, cost‑sensitive demand from innovative services, and limited supply of GPU resources across availability zones, regions, and instance types.

To address these, Dong proposes building a unified compute network, focusing on three core layers: the IAAS layer (trusted networking), the container layer (optimal resource scheduling), and the application layer (traffic scheduling).

Network layer: Various cross‑region communication needs—control commands, large model distribution, service calls, and observability—are handled using a mix of dedicated lines, IPSec VPN, TLS over the public internet, and SD‑WAN where appropriate, balancing latency, security, and cost.

Container layer: Custom Kubernetes scheduler plugins filter and score nodes based on topology snapshots, enabling pod placement that satisfies stability (spreading across nodes) and cost (high utilization). Additional mechanisms handle pod fragmentation, periodic re‑scheduling, multi‑region model distribution via OSS cross‑region replication, and CI/CD pipelines that store model images in NAS‑backed Harbor registries and distribute them to ACK clusters.

These optimizations have raised average cluster utilization above 90% and saved thousands of GPU cards per month.

Application layer: Traffic routing uses an AI gateway to abstract whether large‑model capabilities come from third‑party APIs or internal models. For cross‑region access, a self‑developed K8S Mesh component provides trusted, high‑performance links, while an LLM proxy performs pod‑level load balancing.

Looking ahead, Zuoyebang plans to explore cost‑reduction inference technologies, centralized KVCache solutions (e.g., eRDMA, Tair), and deeper integration of large‑model capabilities into the infrastructure stack, moving from information provision to actionable agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI infrastructure multi-region compute network container scheduling Model Distribution

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.