Artificial Intelligence 5 min read

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

DataFunSummit

Sep 20, 2025

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

Background

WeChat has become essential in daily life, and with AI development it offers many AI computing services such as voice‑to‑text, AIGC in video channels, image recognition, etc. The massive user base means AI workloads are huge.

Why Ray?

To handle large‑scale AI tasks we built the Astra platform, which now runs many AI algorithm services (LLM, multimedia processing). Our main use case is Ray Serve. As a backend‑focused team we needed to bridge AI algorithm services and traditional micro‑services.

Key challenges

Scale : Traditional micro‑services run on a few thousand nodes, but AI services require tens of thousands of nodes and millions of CPU cores.

Resource diversity : AI services need GPUs of various brands (NVIDIA, ZhiXiao, Ascend), each requiring specific adapters.

Operations complexity : AI algorithms are pure compute services without business logic, often needing separate clusters per use case.

Cost : GPU hardware is expensive; reducing inference cost and improving utilization is critical.

Choosing Ray

Ray provides a unified distributed platform that integrates multiple compute models, forming a complete ecosystem, which simplifies development and resource management.

Adoption timeline

Since 2022 we have observed Ray’s advantages and, inspired by successful cases like ChatGPT, invested heavily to extend single‑machine applications to distributed environments.

Architecture

The Astra‑Ray architecture treats each Ray‑based application as a basic unit. It runs on a federated cluster that spans several internal Kubernetes clusters. Each K8s node runs our Starlink management agent, a P2P network‑penetration component, and the TFCC AI runtime.

Images illustrate the platform layout.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Computing WeChat Ray AI scaling GPU Management Astra Platform

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.