Tagged articles

GPU Management

3 articles · Page 1 of 1

Sep 20, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

AI scalingAstra PlatformDistributed Computing

0 likes · 5 min read

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

ITFLY8 Architecture Home

Dec 29, 2017 · Cloud Native

Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets

JD’s Moon Landing Machine Learning Platform, built on Docker and Kubernetes, showcases a cloud‑native architecture that integrates AI services, multi‑tenant security, GPU management, big‑data scheduling, and advanced networking and storage solutions for high‑performance inference and training workloads.

CI/CDCloud NativeGPU Management

0 likes · 15 min read

Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets

21CTO

Aug 20, 2017 · Cloud Native

How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights

This article details JD's AI platform built on Docker and Kubernetes, covering its high‑availability architecture, network plugin choices, storage solutions like GlusterFS and SeaweedFS, GPU management, CI/CD pipelines, logging, monitoring, and native Spark on Kubernetes, illustrating how a cloud‑native stack supports large‑scale AI services.

AI platformCI/CDCloud Native

0 likes · 14 min read

How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights