Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets
JD’s Moon Landing Machine Learning Platform, built on Docker and Kubernetes, showcases a cloud‑native architecture that integrates AI services, multi‑tenant security, GPU management, big‑data scheduling, and advanced networking and storage solutions for high‑performance inference and training workloads.
Recently JD launched the Moon Landing Machine Learning Platform on JD Cloud, offering AI services and opening its AI technology from application services to core algorithms, embodying JD’s RaaS strategy. Engineers from the AI & Big Data team explain the platform’s architecture.
Architecture
The platform’s foundation is Docker + Kubernetes, with compute resources (CPU, GPU, FPGA), high‑speed interconnects (InfiniBand, OPA), and various file systems. On top sit ML frameworks and algorithm libraries, and finally business applications. The management center provides permission, task, workflow, monitoring, and logging services.
The design follows “Kubernetes schedules everything” and aims for high availability, load balancing, application packaging and isolation, automatic scaling, big‑data scheduling, rich hardware support, full cluster utilization, data isolation, and multi‑tenant security.
High availability and load balancing. Inference apps run in containers and must serve traffic reliably.
Application packaging and isolation. Researchers package code into images for CI/CD and transparent execution.
Auto scaling for inference and training. Resources shift between inference apps by day and training jobs by night.
Big‑data scheduling. Native support for TensorFlow, Caffe, XGBoost, MXNet, as well as Hadoop/Spark ecosystems.
Rich hardware types. Support for CPU, GPU, FPGA, InfiniBand, OPA, etc.
Maximize cluster utilization. All resources belong to a single pool regardless of app type.
Data isolation for security. Network separation provides fine‑grained data access control.
Multi‑tenant safety. Multi‑tenancy architecture isolates users at network, filesystem, and kernel levels.
Network
Kubernetes lacks built‑in networking, so JD evaluated Flannel, Weave, and Calico. Calico, based on BGP routing, outperforms overlay networks because it avoids encapsulation and NAT.
Calico also provides three‑layer data‑center networking, using physical MAC addresses to prevent ARP storms, and supports optional IPIP tunnels. For multi‑tenant isolation, Kubernetes NetworkPolicy and Calico policies assign each user a separate Namespace, allowing intra‑Namespace pod communication while blocking inter‑Namespace traffic. Calico extends NetworkPolicy with egress control and fine‑grained rules.
To expose container IPs for external RPC services, JD adopted Cisco’s open‑source Contiv project in VLAN mode, an underlay network that runs at the same layer as physical networks, offering near‑native performance.
Storage
Kubernetes itself does not provide storage; JD uses GlusterFS for file‑level distributed storage, offering elasticity, linear scaling, and high reliability.
GlusterFS volumes are managed via Heketi’s REST API, enabling dynamic provisioning through Kubernetes StorageClass.
For massive small‑file workloads (e.g., image recognition results), JD employs SeaweedFS, a system inspired by Facebook’s Haystack, which provides REST APIs, rack‑aware and datacenter‑aware placement, and currently stores millions of images daily.
HDFS remains essential for large‑scale batch processing; Alluxio is added as a caching layer to accelerate HDFS reads, delivering tens‑fold speed improvements.
Multi‑tenant isolation for storage is enforced with Kerberos authentication and Ranger authorization for HDFS, while GlusterFS volumes are mounted per container, naturally restricting access.
GPU Resource Management
Running on Kubernetes 1.4 (pre‑GPU support), JD built its own multi‑GPU management, including detection, driver mapping, health checks, and GPU‑aware scheduling that considers model, memory, and availability.
Load Balancing
Inference services expose RPC and HTTP interfaces. RPC uses a service registry and client‑side load balancing; HTTP relies on the Kubernetes Ingress controller, which wraps Nginx and automatically reloads configuration when rules change.
CI/CD
JD’s pipeline uses GitLab, Jenkins, and Harbor. Code is pushed to GitLab, Jenkins builds Docker images, which are stored in Harbor and then pulled by Kubernetes workers during deployment. Harbor mirrors accelerate image pulls across data centers.
Logging
Logging follows the EFK stack: containers write to stdout, Docker stores logs, Fluentd forwards them to Kafka, then to Elasticsearch, and Kibana visualizes the data. Kafka smooths spikes and enables downstream consumption.
Monitoring
Monitoring uses Heapster, InfluxDB, and Grafana. Heapster collects metrics from Kubelet, aggregates them, stores them in InfluxDB, and Grafana displays them. JD extended Heapster to provide service‑level metrics.
Kubernetes Scheduling for Spark
JD evaluates Spark on Kubernetes, noting drawbacks of running Spark Standalone inside Docker (overlap, performance loss, lack of multi‑tenant isolation). JD advocates native scheduling where Driver and Executor run in separate containers, leveraging Kubernetes namespaces for isolation and supporting multiple Spark versions.
Compute‑Data Separation
Separating storage and compute is increasingly viable thanks to high‑speed networks (25 GbE, RDMA, SPDK). Benefits include multi‑tenant data security, flexible deployment across data centers, and the ability to insert middleware like Alluxio for performance guarantees.
Benchmarks show that for MLlib algorithms on a 10 GbE network, the performance loss of separating compute from storage is only about 3 %.
Kubernetes provides the cloud‑native foundation for such architectures, fostering a vibrant ecosystem and enabling large‑scale, efficient computing platforms.
The AI Infrastructure team focuses on Kubernetes, AI engineering, and virtualized big‑data systems, with support from Intel for Spark on K8s and BigDL.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
