How JD Built a Scalable AI Platform on Kubernetes: Architecture, Networking, and Storage Insights
This article details JD's AI platform built on Docker and Kubernetes, covering its high‑availability architecture, network plugin choices, storage solutions like GlusterFS and SeaweedFS, GPU management, CI/CD pipelines, logging, monitoring, and native Spark on Kubernetes, illustrating how a cloud‑native stack supports large‑scale AI services.
Architecture
The JD AI platform, launched in September 2016, uses Docker and Kubernetes as the core, with CPU, GPU, FPGA compute resources, high‑speed IB and OPA networks, and diverse file systems, topped by machine‑learning frameworks, algorithm libraries, and business applications.
The management console provides permission, task, workflow, monitoring, and logging centers, following a "Kubernetes schedules everything" philosophy for both inference (App) and training (Job) workloads.
High availability and load balancing for massive inference Apps.
Application packaging and isolation via container images for CI/CD.
Automatic scaling: expand inference services during peak hours and allocate resources to training jobs at night.
Big‑data scheduling: native support for TensorFlow, Caffe, XGBoost, MXNet, as well as Hadoop and Spark ecosystems.
Rich hardware support: CPU, GPU, FPGA, InfiniBand, OPA.
Maximized cluster resource utilization without distinguishing between Apps and Jobs.
Data isolation architecture for security, leveraging network separation.
Multi‑tenant safety for public‑cloud deployments, isolating users at network, filesystem, and kernel levels.
Network
Kubernetes lacks built‑in networking, so JD evaluated Flannel, Weave, and Calico. Calico, based on BGP routing, outperformed overlay solutions by avoiding encapsulation and NAT, offering near‑physical‑machine performance.
Calico also supports IPIP tunneling and requires BGP‑enabled data‑center switches for full functionality.
For multi‑tenant isolation, Kubernetes NetworkPolicy and Calico policies are used: each user gets a dedicated Namespace; intra‑Namespace pods can communicate, inter‑Namespace traffic is blocked. Calico extends egress control and fine‑grained rules.
External RPC services require exposing container IPs; JD adopted Cisco’s open‑source Contiv (VLAN mode) built on OVS to provide underlay networking with performance close to physical networks.
Storage
Kubernetes itself provides no storage; JD selected GlusterFS for file‑level distributed storage, valuing elasticity, linear scaling, and high reliability. GlusterFS uses elastic hashing for file placement, improving parallel access.
Both static and dynamic provisioning are supported; dynamic provisioning uses a StorageClass combined with Heketi, which offers a REST API to create and destroy GlusterFS volumes.
For small‑file workloads (image recognition results), SeaweedFS is used, inspired by Facebook’s Haystack, offering rack‑aware and datacenter‑aware redundancy; it now stores millions of images daily, growing at ~30 GB per day.
HDFS remains essential for large‑scale batch processing; Alluxio is added as a caching layer, delivering dozens‑fold speed improvements. Kerberos and Ranger provide authentication and authorization for multi‑tenant HDFS access.
GPU Resource Management
Running on Kubernetes 1.4 (pre‑multi‑GPU support), JD built custom GPU management: detection, driver mapping, health checks, and GPU‑aware scheduling based on model, memory, and availability to maximize utilization.
Load Balancing
Inference services expose RPC and HTTP interfaces. RPC uses a service registry and client for load balancing; HTTP relies on the Kubernetes Ingress controller (Nginx) to map host/path rules to services.
CI/CD
The pipeline uses GitLab, Jenkins, and Harbor. Code commits trigger Jenkins builds, producing Docker images pushed to Harbor. Images are then pulled by Kubernetes workers for deployment, with mirrored repositories across data centers for faster pulls.
Logging
JD adopts the EFK stack: containers log to stdout, Docker writes to host files, Fluentd collects logs, forwards them to Kafka (for buffering and downstream consumption), then to Elasticsearch, and finally visualizes via Kibana.
Monitoring
Monitoring combines Heapster, InfluxDB, and Grafana. Heapster gathers metrics from Kubelet, aggregates them, and stores them in InfluxDB; Grafana visualizes node, pod, namespace, and service‑level metrics after extending Heapster to include service metrics.
Spark on Kubernetes
Google’s open‑source Spark‑on‑K8s project enables native scheduling of Spark drivers and executors as containers, avoiding the inefficiencies of running Spark Standalone inside Docker. Benefits include clear architecture, Docker‑level isolation, namespace‑based multi‑tenant isolation, and support for multiple Spark versions.
JD performed extensive benchmarks, added multi‑tenant, Python job support, and maintains a customized JD version of Spark‑on‑K8s.
Compute‑Data Separation
Modern architectures separate storage from compute, leveraging high‑bandwidth networks (25 GbE, RDMA, SPDK). This improves multi‑tenant security, allows flexible data‑center placement, and enables integration with user networks without altering data structures.
For TensorFlow/Caffe/MXNet, GlusterFS suffices; for Spark, a decoupled HDFS‑Spark setup incurs only ~3 % performance loss on 10 GbE networks for common MLlib algorithms.
Conclusion
Kubernetes provides a cloud‑native foundation for large‑scale AI platforms, fostering a vibrant ecosystem and enabling enterprises to build efficient, multi‑tenant, high‑performance compute infrastructures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
