JD Retail HDFS Unified Storage: Cross‑Region and Tiered Storage Practices
This article presents JD Retail's large‑scale HDFS deployment, detailing its unified storage architecture, cross‑region data replication challenges and solutions, tiered storage strategies for hot, warm and cold data, and the operational modules that together improve performance, reliability and cost efficiency in a big‑data environment.
01 Overview
With the arrival of the big‑data era, massive data storage and processing have become critical challenges for enterprises. JD Retail relies on Hadoop Distributed File System (HDFS) as a highly reliable and scalable distributed file system that underpins data analysis tools, downstream services, and massive offline jobs. The platform operates tens of thousands of servers, stores data at the exabyte level, and handles daily growth of tens of petabytes, supported by visual management tools that simplify monitoring and operations.
02 Cross‑Region Storage
1. Existing Problems
Single‑datacenter deployments can no longer meet JD's multi‑datacenter expansion needs, leading to issues such as insufficient disaster‑recovery capability, inconsistent metadata across sites, redundant data storage, and uncontrolled inter‑datacenter links.
2. Storage Architecture
JD adopted a full‑storage plus full‑network‑topology strategy, enabling all DataNodes (DN) in a region to report to a common NameNode, achieving unified metadata management, eliminating metadata inconsistency, and reducing migration costs. The new architecture improved migration efficiency by 350 % and increased read performance by over 70 % through read‑only nodes and write‑read separation.
3. Challenges
Rapid cluster expansion, cross‑region heartbeat stability, and traffic control between datacenters required new management mechanisms and dynamic throttling to avoid queue backlogs and ensure reliable data synchronization.
03 Tiered Storage
1. Storage Comparison
Hot data is stored on high‑performance SSDs, warm data on standard HDDs, and cold data on high‑density HDDs. This hierarchy addresses the waste caused by treating hot and cold data alike and leverages hardware differences for optimal performance.
2. JD Tiered Storage Strategy
Three‑level tiering (hot‑SSD, warm‑HDD, cold‑high‑density HDD) is enforced by labeling directories with XATTR tags. Automatic conversion modules in the NameNode move data between tiers based on access patterns, TTL, and erasure coding for cold data, while read/write weight, storage usage, and node health guide block placement.
3. Core Modules
Data Access Monitor – uses LRU to identify hot files and provides APIs for policy changes.
Tier Management Module – scans tagged directories, creates conversion tasks, and submits them to the Task Management Module.
Task Management Module – a distributed scheduler that dispatches block delete, copy, or recovery tasks to DataNodes, extending community task types.
These modules together improved overall performance by 10 %, increased erasure‑coded data coverage to 30 %, and reduced cold‑data storage cost by 90 %.
04 Practical Integration
1. Cross‑Region Lifecycle Management
Data that remains unread for long periods is migrated from multi‑datacenter storage to a single‑datacenter tier and then converted to erasure‑coded cold storage, dramatically lowering redundancy and cost.
2. Data Scheduling
By monitoring access, hot data is “re‑heated” and distributed across regions for faster reads, while task‑drift mechanisms allocate jobs to clusters with available resources, improving execution timeliness.
Conclusion
JD Retail's unified HDFS storage solution, combining cross‑region replication and tiered storage, achieves both performance gains and significant cost reductions, offering a reference architecture for large‑scale distributed storage systems.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.