How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

This article examines how Tencent leverages the Ray distributed framework within the Astra platform to handle WeChat's massive AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost while outlining the architecture and practical benefits.

DataFunSummit
DataFunSummit
DataFunSummit
How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

Overview – The article explores the large‑scale practice of Ray in WeChat AI computing.

Key Discussion Points

Background

Cluster management for million‑node scale

Efficient use of low‑priority resources

Reducing deployment complexity

Astra‑Ray usage examples

Q&A

Background

WeChat has become an essential part of daily life, and AI‑enabled services such as voice‑to‑text, AIGC video recommendations, and image recognition are now offered at massive scale. The user base drives an equally massive demand for AI computation.

Why Ray?

To meet this demand, Tencent built the Astra platform, which now hosts many AI algorithm services covering LLMs and multimedia processing. The primary use case is Ray Serve. As a former pure backend team, the engineers needed to rethink the differences between AI algorithm services and traditional micro‑services.

Traditional micro‑services typically involve a few thousand nodes and tens of thousands of CPU cores. In contrast, AI algorithm services require hundreds of thousands of nodes and millions of cores, imposing extreme requirements on module management and Kubernetes clusters.

GPU resources add further complexity: various GPU types (NVIDIA, ZhiXiao, Ascend, etc.) each need specific adapters, increasing deployment effort.

Operationally, AI services are pure algorithm services without business logic, often requiring separate clusters for different use cases, which raises maintenance difficulty.

Cost is also a major concern because GPU hardware is expensive; reducing inference cost and improving resource utilization are critical goals.

Adoption of Ray

Ray offers a unified distributed platform that integrates multiple compute models, forming a complete ecosystem. Tencent noticed Ray’s advantages as early as 2022, inspired by successful cases such as ChatGPT. The team invested in Ray to simplify scaling from single‑machine to distributed environments, streamline development, and achieve more efficient resource management.

Architecture Integration

The Astra platform treats Ray‑based applications as its basic unit. Underneath, a custom federated cluster architecture connects multiple internal Kubernetes clusters, allowing deployments across them. Each node runs the Starlink cluster‑management agent, a P2P network‑penetration component, and the TFCC AI runtime.

Architecture diagram
Architecture diagram

These excerpts are taken from the e‑book “Technology Fusion under Large Models: Innovative Practices for AIGC Deployment”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed computingWeChatTencentRayAI scalingAstra Platform
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.