How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

This article details the AI computing challenges faced by WeChat, explains why the Ray distributed engine was chosen, and describes the design and large‑scale deployment of the AstraRay platform—including scheduling, resource management, and multi‑model support—to achieve low‑cost, high‑efficiency AI services.

WeChat Backend Team
WeChat Backend Team
WeChat Backend Team
How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

1. Background

WeChat hosts numerous AI computing scenarios, primarily in traffic distribution, product operation, and content creation. AI is used for search, advertising, recommendation feature generation, product function optimization, and AIGC tasks such as text‑to‑image generation. Existing backend infrastructure struggles with the high compute intensity, heterogeneous hardware requirements, complex deployment, and low observability of AI workloads, creating a need for a low‑cost, high‑efficiency AI computing platform.

2. Why Introduce Ray?

Ray is an open‑source, general‑purpose distributed computing engine from UC Berkeley RISELab (2016). It simplifies distributed programming: developers add a decorator to any Python function or class, and Ray handles scheduling, state management, and fault tolerance. The following example shows a simple OCR pipeline written with Ray.

def detect(image_data):
    model = load_detect_model()
    return model(image_data)

def recognize(det_result):
    model = load_recognize_model()
    return model(det_result)

def ocr(image_data):
    det_result = detect(image_data)
    return recognize(det_result)

image_data = load_image_data()
ocr_result = ocr(image_data)

By adding @ray.remote and specifying resource requirements, the same pipeline can be deployed as distributed micro‑services, reducing deployment complexity and improving efficiency.

@ray.remote(num_gpus=1, num_cpus=16)
def detect(image_data):
    model = load_detect_model()
    return model(image_data)

@ray.remote(num_gpus=2, num_cpus=16)
def recognize(detect_result):
    model = load_recognize_model()
    return model(detect_result)

@ray.remote(num_cpus=4)
def ocr(image_data):
    det_result = detect.remote(image_data)
    return recognize.remote(det_result)

image_data = load_image_data()
ocr_result = ocr.remote(image_data)

Using Ray, the OCR inference workload achieves at least an order‑of‑magnitude improvement in deployment speed.

3. Large‑Scale Practice of Ray in WeChat AI Computing

Existing platforms such as P6n (Kubernetes‑based micro‑service platform) and Gemini (Kubernetes big‑data platform) cannot simultaneously satisfy low‑cost, high‑throughput, and low‑latency AI workloads. AstraRay, built on Ray, addresses three core challenges:

Support heterogeneous resource expansion for low cost.

Enable massive‑scale resource scheduling for high throughput.

Reduce multi‑model deployment complexity.

3.1 AstraRay Architecture

3.2 Supporting Million‑Scale Pods

A shared‑scheduler architecture (Starlink) is adopted to provide a global resource view and optimistic concurrency control, allowing a single Ray application to manage millions of pods across heterogeneous platforms.

3.3 Building Stable Services on Unstable Resources

Starlink uses PreStop hooks for graceful pod termination and a fast broadcast mechanism to synchronize node status, achieving node eviction within 4 seconds and dramatically reducing failure rates.

3.4 Reducing Deployment Complexity

AstraRay transforms multi‑model, multi‑card, and multi‑module expansion (originally O(n³) complexity) into O(1) by providing unified runtime environments, P2P‑accelerated model distribution, and federated Ray clusters.

4. Summary

The rise of AI imposes significant challenges on WeChat’s backend infrastructure. By adopting Ray as the distributed foundation and extending it with AstraRay, WeChat achieves low‑cost, high‑efficiency AI computing, simplifies cluster management, and leverages idle resources, laying a solid groundwork for future AI services.

5. References

https://www.infoq.cn/article/vt4lewlrgumufibrulhz

https://www.anyscale.com/blog/four-reasons-why-leading-companies-are-betting-on-ray

https://docs.ray.io/en/latest/cluster/kubernetes/index.html

https://juejin.cn/post/7313601254365691941

https://www.cl.cam.ac.uk/research/srg/netos/camsas/blog/2016-03-09-scheduler-architectures.html

https://api7.ai/blog/api7-cloud-integrates-kubernetes-service-discovery

https://github.com/nginx/nginx/commit/52327e0627f49dbda1e8db695e63a4b0af4448b1

https://www.anyscale.com/blog/handling-files-and-packages-on-your-cluster-with-ray-runtime-environments

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

schedulingDistributed ComputingWeChatRayAI PlatformAstraRay
WeChat Backend Team
Written by

WeChat Backend Team

Official account of the WeChat backend development team, sharing their experience in large-scale distributed system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.