How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

DataFunSummit
DataFunSummit
DataFunSummit
How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

Background

WeChat has become essential in daily life, and with AI development it offers many AI computing services such as voice‑to‑text, AIGC in video channels, image recognition, etc. The massive user base means AI workloads are huge.

Why Ray?

To handle large‑scale AI tasks we built the Astra platform, which now runs many AI algorithm services (LLM, multimedia processing). Our main use case is Ray Serve. As a backend‑focused team we needed to bridge AI algorithm services and traditional micro‑services.

Key challenges

Scale : Traditional micro‑services run on a few thousand nodes, but AI services require tens of thousands of nodes and millions of CPU cores.

Resource diversity : AI services need GPUs of various brands (NVIDIA, ZhiXiao, Ascend), each requiring specific adapters.

Operations complexity : AI algorithms are pure compute services without business logic, often needing separate clusters per use case.

Cost : GPU hardware is expensive; reducing inference cost and improving utilization is critical.

Choosing Ray

Ray provides a unified distributed platform that integrates multiple compute models, forming a complete ecosystem, which simplifies development and resource management.

Adoption timeline

Since 2022 we have observed Ray’s advantages and, inspired by successful cases like ChatGPT, invested heavily to extend single‑machine applications to distributed environments.

Architecture

The Astra‑Ray architecture treats each Ray‑based application as a basic unit. It runs on a federated cluster that spans several internal Kubernetes clusters. Each K8s node runs our Starlink management agent, a P2P network‑penetration component, and the TFCC AI runtime.

Images illustrate the platform layout.

WeChat AI platform overview
WeChat AI platform overview
Ray integration diagram
Ray integration diagram
Performance chart
Performance chart
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed computingWeChatRayAI scalingGPU ManagementAstra Platform
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.