How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how Tencent's WeChat team leveraged the Ray distributed computing framework within the Astra platform to tackle massive AI workloads, addressing challenges of scale, GPU diversity, operational complexity, and cost while outlining their architecture and practical insights.

DataFunSummit
DataFunSummit
DataFunSummit
How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

Overview : The article explores the large‑scale practice of using Ray for AI computing in WeChat, focusing on six key points: background, million‑node cluster management, efficient use of low‑priority resources, reducing deployment complexity, Astra‑Ray usage examples, and a Q&A.

Background : WeChat has become an essential part of daily life, and AI‑enabled features such as voice‑to‑text, AIGC in video channels, and image recognition generate massive AI compute demands.

To meet these demands, Tencent built the Astra platform, which now hosts numerous AI algorithm services covering LLMs and multimedia processing.

The team primarily uses Ray Serve on Astra. Coming from a pure backend development background, they deeply consider the differences between AI algorithm services and traditional micro‑services.

Scale Challenges : Traditional micro‑services run on a few thousand nodes and tens of thousands of cores, whereas AI algorithm services may require hundreds of thousands of nodes and millions of cores, putting extreme pressure on module management and Kubernetes clusters.

GPU Diversity : AI services need specific GPU types (e.g., NVIDIA, ZhiXiao, Ascend). Each GPU model requires dedicated adaptation, greatly increasing deployment complexity.

Operational Complexity : Unlike business‑logic‑centric micro‑services, AI algorithms are pure computation services without shared business logic, often requiring separate clusters for different use cases, which raises operational difficulty.

Cost Considerations : GPU hardware is expensive, making cost reduction and resource utilization critical goals.

Given these factors, the team chose Ray because it offers a unified distributed platform that integrates multiple compute models into a cohesive ecosystem.

Since 2022, after observing successful Ray deployments in leading companies (including ChatGPT), they invested heavily in Ray to simplify scaling from single‑machine to distributed environments, streamline development, and improve resource management.

The combined Astra‑Ray architecture treats each Ray‑based application as a basic unit, supported by a custom federated cluster that spans multiple internal Kubernetes clusters. Each K8s node runs a Starlink management agent, a P2P network‑penetration component, and the TFCC AI runtime.

Excerpt from the e‑book “Technology Fusion Driven by Large Models: Innovative Practices for AIGC Deployment”.

Speaker: Chen Guomin, Tencent, WeChat Astra Platform Lead; Su Wenhao, Tencent, Senior WeChat Engineer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed computingWeChatRayAI InfrastructureAstra Platform
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.