How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent
This article examines how Tencent leverages the Ray distributed framework within the Astra platform to handle WeChat's massive AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost while outlining the architecture and practical benefits.
Overview – The article explores the large‑scale practice of Ray in WeChat AI computing.
Key Discussion Points
Background
Cluster management for million‑node scale
Efficient use of low‑priority resources
Reducing deployment complexity
Astra‑Ray usage examples
Q&A
Background
WeChat has become an essential part of daily life, and AI‑enabled services such as voice‑to‑text, AIGC video recommendations, and image recognition are now offered at massive scale. The user base drives an equally massive demand for AI computation.
Why Ray?
To meet this demand, Tencent built the Astra platform, which now hosts many AI algorithm services covering LLMs and multimedia processing. The primary use case is Ray Serve. As a former pure backend team, the engineers needed to rethink the differences between AI algorithm services and traditional micro‑services.
Traditional micro‑services typically involve a few thousand nodes and tens of thousands of CPU cores. In contrast, AI algorithm services require hundreds of thousands of nodes and millions of cores, imposing extreme requirements on module management and Kubernetes clusters.
GPU resources add further complexity: various GPU types (NVIDIA, ZhiXiao, Ascend, etc.) each need specific adapters, increasing deployment effort.
Operationally, AI services are pure algorithm services without business logic, often requiring separate clusters for different use cases, which raises maintenance difficulty.
Cost is also a major concern because GPU hardware is expensive; reducing inference cost and improving resource utilization are critical goals.
Adoption of Ray
Ray offers a unified distributed platform that integrates multiple compute models, forming a complete ecosystem. Tencent noticed Ray’s advantages as early as 2022, inspired by successful cases such as ChatGPT. The team invested in Ray to simplify scaling from single‑machine to distributed environments, streamline development, and achieve more efficient resource management.
Architecture Integration
The Astra platform treats Ray‑based applications as its basic unit. Underneath, a custom federated cluster architecture connects multiple internal Kubernetes clusters, allowing deployments across them. Each node runs the Starlink cluster‑management agent, a P2P network‑penetration component, and the TFCC AI runtime.
These excerpts are taken from the e‑book “Technology Fusion under Large Models: Innovative Practices for AIGC Deployment”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
