Tagged articles

GPU Management

3 articles · Page 1 of 1
DataFunSummit
DataFunSummit
Sep 20, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

AI scalingAstra PlatformDistributed Computing
0 likes · 5 min read
How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments
21CTO
21CTO
Aug 20, 2017 · Cloud Native

How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights

This article details JD's AI platform built on Docker and Kubernetes, covering its high‑availability architecture, network plugin choices, storage solutions like GlusterFS and SeaweedFS, GPU management, CI/CD pipelines, logging, monitoring, and native Spark on Kubernetes, illustrating how a cloud‑native stack supports large‑scale AI services.

AI platformCI/CDCloud Native
0 likes · 14 min read
How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights