How JD.com Scales Storage, Caching, and Messaging with Its Own Cloud Platform
This article explains how JD.com’s three‑layer cloud architecture—storage, middleware, and elastic compute—addresses challenges of massive unstructured data, ever‑growing caches, high‑volume messaging, and rapid service deployment using home‑grown systems like JFS, Jimdb, JMQ, and JSF.
Basic cloud services support many JD.com business developments. They are divided into three layers: the underlying storage services, core middleware, and the upper elastic compute cloud, all exposed via APIs to other business units.
Storage System Challenges and Solutions
Storage is the most fundamental component for internet companies and consumes the most development effort.
Challenge 1. Unstructured Storage
JD.com receives tens of millions of merchant images daily, generates large amounts of order‑related text, and produces numerous small unstructured records from warehouse operations, electronic signatures, and bank receipts. These data are massive in volume but small in size per item.
To handle this, JD.com built JFS (Jingdong Filesystem), a large‑scale distributed storage system supporting BLOBs, files, and blocks. JFS 3.0 provides unified management of small objects, large files, and persistent block devices.
Technically, JFS ensures strong consistency with Paxos replication, uses a unified storage engine for various data models, applies Reed‑Solomon coding for petabyte‑scale cost reduction, and integrates metadata management with Hadoop.
JFS currently powers several JD.com services:
Image service Order fulfillment Logistics data exchange Electronic signing Internal cloud storage VM and container volume storage
Challenge 2. Growing Caches
To guarantee fast response, many data items (e.g., product prices, recommendation results) are cached in memory. As cache size and the number of high‑memory machines increase, managing them becomes a major challenge. Early solutions like Memcached and Redis evolve when scaling to massive workloads.
Jimdb: Distributed Cache and High‑Speed NoSQL
Jimdb is JD.com’s enterprise‑grade NoSQL service that provides distributed caching and high‑performance key‑value storage while being fully compatible with the Redis protocol. Compared with Redis, Jimdb offers:
Accurate fault detection and automatic failover RAM/SSD hybrid storage Online horizontal scaling Asynchronous, synchronous, and partial replication Fully automated onboarding and management
Automation of onboarding and management has been a primary focus in the past six months to reduce maintenance costs.
Jimdb runs on over 3,000 high‑memory + SSD machines, supporting JD.com’s product detail pages, search, recommendation, and ad click services.
Next‑Generation Storage Platform
Recent work aims to enable multi‑data‑center replication for higher reliability and to create a unified storage service—“One Jingdong One Storage”—that abstracts files, objects, tables, and even caches, providing consistent replication across distributed data centers, primary‑IDC caching, and optional full‑memory acceleration. The platform also integrates with online services, Hadoop, and private‑cloud container volume management.
Middleware: Message Queues and SOA
Above the storage layer lie various middleware components.
Challenge: Massive Message Traffic
JD.com operates dozens of servers in each regional warehouse, effectively forming small data centers. Message queues must connect core data centers with these warehouses, driving order processing pipelines. Daily message volume exceeds hundreds of billions.
JMQ: Jingdong Message Queues
JMQ, the third‑generation JD.com message‑queue system, was launched before last year’s Double‑11 shopping festival. Its key features include:
Data‑center power loss does not lose messages Group commit technique for higher disk‑write performance Transparent compression Flexible replication
Challenge: Expanding Online Services
JD.com’s e‑commerce system hosts many services that call each other internally and expose APIs to merchants and partners.
JSF: Jingdong Service Framework
JSF provides runtime service‑quality analysis and comprehensive service‑governance capabilities. It has been adopted by tens of thousands of servers, enabling internal SOA and external service exposure.
Elastic Compute Cloud
The elastic compute cloud project aims to bridge IDC resources and business systems, fully decoupling workloads from machines, achieving automated maintenance, shortening development‑to‑deployment cycles, and allowing engineers to focus on product design rather than resource provisioning.
Challenge: Explosive Growth of Machines
Rapid business expansion leads to exponential growth in the number of machines across multiple data centers. Managing these resources efficiently is essential for higher utilization and service quality.
Elastic Compute Cloud Architecture
The architecture consists of two layers: a foundational layer implementing software‑defined data centers via OpenStack and Docker (containerization) with JFS for reliable storage, and a platform layer that orchestrates unified resource allocation, automatic scaling based on workload, and hides infrastructure details from applications.
The system is already in production for services such as product detail pages and image processing, automatically scaling resources during traffic spikes.
Business growth drives infrastructure evolution; Technology success depends on the team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
