How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay
This article details the AI computing challenges faced by WeChat, explains why the Ray distributed engine was chosen, and describes the design and large‑scale deployment of the AstraRay platform—including scheduling, resource management, and multi‑model support—to achieve low‑cost, high‑efficiency AI services.
1. Background
WeChat hosts numerous AI computing scenarios, primarily in traffic distribution, product operation, and content creation. AI is used for search, advertising, recommendation feature generation, product function optimization, and AIGC tasks such as text‑to‑image generation. Existing backend infrastructure struggles with the high compute intensity, heterogeneous hardware requirements, complex deployment, and low observability of AI workloads, creating a need for a low‑cost, high‑efficiency AI computing platform.
2. Why Introduce Ray?
Ray is an open‑source, general‑purpose distributed computing engine from UC Berkeley RISELab (2016). It simplifies distributed programming: developers add a decorator to any Python function or class, and Ray handles scheduling, state management, and fault tolerance. The following example shows a simple OCR pipeline written with Ray.
def detect(image_data):
model = load_detect_model()
return model(image_data)
def recognize(det_result):
model = load_recognize_model()
return model(det_result)
def ocr(image_data):
det_result = detect(image_data)
return recognize(det_result)
image_data = load_image_data()
ocr_result = ocr(image_data)By adding @ray.remote and specifying resource requirements, the same pipeline can be deployed as distributed micro‑services, reducing deployment complexity and improving efficiency.
@ray.remote(num_gpus=1, num_cpus=16)
def detect(image_data):
model = load_detect_model()
return model(image_data)
@ray.remote(num_gpus=2, num_cpus=16)
def recognize(detect_result):
model = load_recognize_model()
return model(detect_result)
@ray.remote(num_cpus=4)
def ocr(image_data):
det_result = detect.remote(image_data)
return recognize.remote(det_result)
image_data = load_image_data()
ocr_result = ocr.remote(image_data)Using Ray, the OCR inference workload achieves at least an order‑of‑magnitude improvement in deployment speed.
3. Large‑Scale Practice of Ray in WeChat AI Computing
Existing platforms such as P6n (Kubernetes‑based micro‑service platform) and Gemini (Kubernetes big‑data platform) cannot simultaneously satisfy low‑cost, high‑throughput, and low‑latency AI workloads. AstraRay, built on Ray, addresses three core challenges:
Support heterogeneous resource expansion for low cost.
Enable massive‑scale resource scheduling for high throughput.
Reduce multi‑model deployment complexity.
3.1 AstraRay Architecture
3.2 Supporting Million‑Scale Pods
A shared‑scheduler architecture (Starlink) is adopted to provide a global resource view and optimistic concurrency control, allowing a single Ray application to manage millions of pods across heterogeneous platforms.
3.3 Building Stable Services on Unstable Resources
Starlink uses PreStop hooks for graceful pod termination and a fast broadcast mechanism to synchronize node status, achieving node eviction within 4 seconds and dramatically reducing failure rates.
3.4 Reducing Deployment Complexity
AstraRay transforms multi‑model, multi‑card, and multi‑module expansion (originally O(n³) complexity) into O(1) by providing unified runtime environments, P2P‑accelerated model distribution, and federated Ray clusters.
4. Summary
The rise of AI imposes significant challenges on WeChat’s backend infrastructure. By adopting Ray as the distributed foundation and extending it with AstraRay, WeChat achieves low‑cost, high‑efficiency AI computing, simplifies cluster management, and leverages idle resources, laying a solid groundwork for future AI services.
5. References
https://www.infoq.cn/article/vt4lewlrgumufibrulhz
https://www.anyscale.com/blog/four-reasons-why-leading-companies-are-betting-on-ray
https://docs.ray.io/en/latest/cluster/kubernetes/index.html
https://juejin.cn/post/7313601254365691941
https://www.cl.cam.ac.uk/research/srg/netos/camsas/blog/2016-03-09-scheduler-architectures.html
https://api7.ai/blog/api7-cloud-integrates-kubernetes-service-discovery
https://github.com/nginx/nginx/commit/52327e0627f49dbda1e8db695e63a4b0af4448b1
https://www.anyscale.com/blog/handling-files-and-packages-on-your-cluster-with-ray-runtime-environments
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
WeChat Backend Team
Official account of the WeChat backend development team, sharing their experience in large-scale distributed system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
