Tagged articles
3 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Sep 20, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

AI scalingAstra PlatformGPU Management
0 likes · 5 min read
How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Dec 29, 2017 · Cloud Native

Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets

JD’s Moon Landing Machine Learning Platform, built on Docker and Kubernetes, showcases a cloud‑native architecture that integrates AI services, multi‑tenant security, GPU management, big‑data scheduling, and advanced networking and storage solutions for high‑performance inference and training workloads.

Cloud NativeGPU ManagementKubernetes
0 likes · 15 min read
Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets
21CTO
21CTO
Aug 20, 2017 · Cloud Native

How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights

This article details JD's AI platform built on Docker and Kubernetes, covering its high‑availability architecture, network plugin choices, storage solutions like GlusterFS and SeaweedFS, GPU management, CI/CD pipelines, logging, monitoring, and native Spark on Kubernetes, illustrating how a cloud‑native stack supports large‑scale AI services.

AI PlatformCloud NativeGPU Management
0 likes · 14 min read
How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights