How JD.com Engineered Its Own Distributed Storage System for Billions of Files

This article chronicles JD.com's journey from recognizing massive storage demands to designing, building, and evolving a self‑developed distributed storage platform—JFS—that handles small and large files, powers a custom image system, object storage, and future container‑native workloads.

ITPUB
ITPUB
ITPUB
How JD.com Engineered Its Own Distributed Storage System for Billions of Files

Background and Requirements

JD.com needed a storage system that could reliably keep massive volumes of images, videos, and text while providing low‑latency read/write operations. The system must guarantee zero data loss during power failures or hardware faults, remain available during network or datacenter outages, and scale to handle traffic spikes in major sales events.

Technology Evaluation and Decision

Open‑source candidates such as HDFS, FastDFS and HBase were examined. HDFS is optimized for offline large‑file workloads and cannot efficiently serve JD’s online small‑file traffic. HBase introduces an extra network hop (client → RegionServer → HDFS) and suffers brief service interruptions during region splits. After weighing customization effort against long‑term operational benefits, JD chose to develop a proprietary distributed storage system, JFS (JD File System).

JFS Small‑File Storage Architecture

JFS uses a three‑replica strong‑consistency protocol (1 Primary + 2 Followers). Write requests are sent to the Primary, which forwards the data to both Followers; the write is acknowledged only after all three replicas have persisted the data. Reads are directed preferentially to Followers to increase concurrency.

To avoid the overhead of storing millions of tiny files, JFS pre‑creates large container files (e.g., several gigabytes) and appends new data using offset‑length pointers. Multiple container files are used in parallel to reduce lock contention.

Small‑file replication architecture
Small‑file replication architecture
Append‑only large file design
Append‑only large file design

Migration to JFS for Image Service

In 2014 JD migrated ~2 billion historical images from a legacy service to JFS. The migration, including data double‑write, verification and cut‑over, was completed in about one month. After migration, image access latency dropped dramatically and the system could serve the traffic spikes of major sales events.

Only the original high‑resolution image is stored; on‑the‑fly compressed variants are generated by the CDN. Adoption of the WebP format (in collaboration with Intel) reduced image size by ~50 %, cutting CDN bandwidth by ~30 %.

Image processing pipeline
Image processing pipeline

JFS Large‑File Storage

For large files the 1 Primary + 2 Followers model creates a bandwidth bottleneck at the Primary. JFS therefore switches to chain replication, where each node forwards data to the next node in the chain, fully utilizing aggregate network bandwidth.

Large files are split into fixed‑size blocks (e.g., 64 MiB) and stored across multiple nodes. This chunking balances I/O load, avoids hot‑spot disks, and enables fine‑grained resource scheduling.

Chain replication diagram
Chain replication diagram
Chunked large‑file storage
Chunked large‑file storage

Object Storage Layer (S3‑compatible)

On top of JFS JD built an object storage service exposing an S3‑compatible HTTP API. It supports objects from 1 byte up to 1 TB and provides hierarchical listing via prefix and delimiter parameters.

Metadata is persisted in MySQL while a high‑performance KV cache (Jimdb) serves hot entries. An automatic sharding component (ET) monitors table sizes, splits tables when thresholds are reached, and migrates data online without service interruption, allowing metadata management at the hundred‑billion‑object scale.

Metadata architecture
Metadata architecture

At peak traffic (e.g., Double‑11 shopping festival) the service handles up to 25 000 concurrent reads/writes per second and stores over 10 PB of data across more than 1 200 JD business lines.

Electronic Receipt Backend

The object storage layer powers an electronic receipt system that replaces millions of paper receipts. The workflow generates signed receipt images, encrypts them before storage, and decrypts them on demand. This satisfies regulatory retention periods (one year) while providing high security, durability, and cost savings.

Electronic receipt workflow
Electronic receipt workflow

Unified Storage Vision

JFS now unifies small‑file and large‑file engines under a plug‑in architecture. Both engines share the same three‑replica protocol, with optional chain replication for large objects. Different storage engines are selected per file size, allowing the system to handle diverse workloads while preserving high performance and strong consistency.

Unified storage architecture
Unified storage architecture

Future Directions

Plans include exposing JFS as a shared storage mount that can be attached directly to containers. This will allow containerized applications to read/write unstructured data without intermediate local disks, improving fault tolerance, scheduling flexibility, and overall resource efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Engineeringlarge-scale systemsdistributed storagemetadata managementobject storageJFS
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.