Big Data 20 min read

JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS

This article details JD's large‑scale HDFS unified storage implementation, covering cross‑region storage challenges, topology design, asynchronous block replication, flow‑control mechanisms, tiered storage strategies, automatic hot‑cold data migration, and the resulting performance and cost improvements for big‑data workloads.

JD Retail Technology
JD Retail Technology
JD Retail Technology
JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS

Overview

With the rise of big‑data workloads, JD Retail built a robust HDFS‑based offline storage platform that supports petabyte‑scale data, millions of daily jobs, and visual management tools for efficient operation.

Cross‑Region Storage

Traditional single‑datacenter deployments cannot meet JD's multi‑datacenter growth, leading to issues such as limited disaster‑recovery, inconsistent metadata, data redundancy, and uncontrolled inter‑datacenter links.

JD adopted a full‑copy plus full‑mesh topology, allowing each DataNode to report to a common NameNode, enabling unified metadata management, consistent data placement, and cost‑effective migration with a 350% efficiency boost.

Challenges include rapid cluster scaling, heartbeat stability across long distances, and complex topology control, all of which require sophisticated monitoring and traffic‑shaping.

Topology and Data Storage

Cross‑domain tags (XTTR) are stored in EditLog and fsimage, allowing directory‑level hot‑cold labeling; the nearest tag is applied to new files, ensuring efficient data placement.

Cross‑Domain Data Flow

When a client writes data, the block’s cross‑domain tag determines the target datacenter; a CR‑check module then issues asynchronous block‑copy tasks to achieve consistency and redundancy.

Cross‑Region Block Supplement

The CR‑check module replaces distcp for block replication, offering higher concurrency and better node selection, while also handling existing data through asynchronous updaters that prioritize high‑priority tables.

Asynchronous Updater

This component processes bulk data updates with a priority queue, polling tasks across directories to avoid blocking large tables and ensuring responsive migration.

Cross‑Domain Flow Control

Separate queues and rate limiters per datacenter prevent bottlenecks on narrow inter‑datacenter links, and RPC requests carry datacenter metadata to direct reads/writes appropriately.

Tiered Storage

To address hot, warm, and cold data, JD classifies storage machines into SSD, HDD, and high‑density HDD, assigning data based on access patterns and using XATTRs for labeling.

An automatic conversion module in the NameNode migrates data between tiers, employing TTL and erasure coding for cold data to improve durability and reduce cost.

Key modules include a data‑access monitor (LRU‑based), a tier‑management module that creates conversion tasks, and a distributed task scheduler that dispatches operations to DataNodes.

These optimizations yielded a 10% overall performance gain, a 30% increase in EC coverage, and a 90% reduction in cold‑data storage cost.

Practical Applications

Two use cases illustrate the benefits: cross‑region lifecycle management reduces redundancy by moving stale data to single‑datacenter, high‑density storage with EC; and data‑scheduling leverages hot‑data detection to rebalance workloads across clusters, improving task latency.

Conclusion

JD's unified storage solution combines cross‑region replication and tiered storage to achieve both performance enhancements and significant cost savings for large‑scale distributed storage systems.

big datadata managementdistributed file systemHDFSTiered StorageCross-Region Storage
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.