JD's Large‑Scale Hadoop Cluster Resource Management and Scheduling Architecture
This article describes how JD built a multi‑regional, ten‑thousand‑node Hadoop ecosystem, unified resource management with YARN, introduced a three‑level Router scheduling layer, optimized performance, and integrated deep‑learning frameworks to achieve high availability, cost efficiency, and scalable big‑data processing.
As JD's e‑commerce business rapidly expanded, the original Hadoop clusters could no longer meet the soaring storage and compute demands, prompting the need for a unified, massive‑scale Hadoop platform that can serve multiple parallel frameworks.
Hadoop, a decade‑old big‑data platform, uses inexpensive commodity machines to form a distributed storage (HDFS) and compute (MapReduce) system; since Hadoop 2.0 the compute layer has been replaced by YARN, which provides resource management and job scheduling.
The core YARN components are ResourceManager, NodeManager, ApplicationMaster, Container, and Client, each responsible for cluster‑wide resource allocation, node‑level execution, application coordination, resource encapsulation, and job submission.
JD introduced a distributed resource‑management and scheduling system that aggregates scattered clusters into a single super‑cluster, allowing all parallel frameworks (Presto, Alluxio, TensorFlow, Caffe, etc.) to share resources via YARN, thereby improving utilization and reducing waste.
By running Docker containers on YARN, JD isolates execution environments, supports GPU and other hardware scheduling, and provides unified logging tools, enabling each user to customize their own runtime without affecting others.
The platform also migrated many legacy clusters to YARN, built deep‑learning on‑YARN solutions, and achieved cross‑region data and job migration through a multi‑datacenter HDFS and scheduling extension.
A new Router component adds a third‑level scheduling abstraction, turning the original two‑level YARN scheduling into a three‑level hierarchy (Router → Sub‑cluster → YARN), allowing dynamic cross‑sub‑cluster resource borrowing and logical queue management.
Router works with a State&Policy Store that persistently records sub‑cluster status, active ResourceManager addresses, and scheduling policies; high‑availability is ensured by deploying multiple Router instances with automatic failover.
The job submission flow is: Client → Router (selects logical queue) → Router forwards to the appropriate ResourceManager → ApplicationMaster requests containers via AMRMProxy → containers are launched on the target sub‑cluster.
To support ten‑thousand‑node clusters, JD developed a queue‑mirroring multi‑path allocation strategy, added multi‑dimensional scheduling rules (memory, load, utilization), and performed RM code optimizations such as lock reduction and simplified resource calculations; MapReduce shuffle services were also tuned.
Benchmark and analysis tools used include HiBench, Hadoop's built‑in benchmarks, GCeasy, FastThread, Linux perf, NMON, and Google performance tools.
Future directions focus on cost reduction by integrating JD's private cloud (Archimedes) for elastic scaling during peak events, and productizing the platform's middleware and services for external customers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
