How eBay Scaled HDFS to 800 PB Using Federation and Router‑Based Architecture
This article details eBay's evolution of its massive HDFS storage—from a single‑cluster design to ViewFS Federation, then to Router‑Based Federation—highlighting the performance bottlenecks, optimization techniques, FastCopy integration, and future plans for further scaling and automation.
eBay HDFS Scale and Challenges
eBay operates more than 10 HDFS clusters with over 20,000 nodes. The largest cluster contains >5,000 machines, stores >800 PB of data and processes >100 k MapReduce/YARN jobs per day. Rapid data growth created three major pressure points:
Continuous increase in both file data and metadata storage.
Single‑point performance bottleneck at the NameNode.
Operational complexity of managing many independent clusters.
Early Performance Optimizations (single‑cluster mode)
Before federation, eBay tuned the original NameNode to improve throughput:
Heavy API call reduction : avoided costly Balancer block queries to standby NameNode, limited batch‑size deletes, used ListStatus without block locations, and split snapshot directories.
Asynchronous RPC response handling (JIRA HDFS‑15486): moved encryption of RPC responses to a separate thread, freeing the NameNode handler thread pool.
Namespace lock refinements (JIRA HDFS‑15553): removed redundant directory locks, converted SetTimes write locks to read locks, and introduced a read‑write call‑queue.
ViewFS‑Based Federation
To break the NameNode bottleneck, eBay introduced a ViewFS federation in 2019. Metadata was split into multiple namespaces, each served by its own NameNode. This horizontal scaling increased overall metadata throughput but introduced two management drawbacks:
High maintenance cost – every new namespace required client‑side configuration updates.
Lack of client transparency – data migration between namespaces forced application changes.
FastCopy Data‑Migration Process
FastCopy was built to move large volumes of data between federated namespaces without copying block bytes.
Revoke directory permissions.
Close any open file handles.
Execute FastCopy to create hard‑links for blocks.
Retry on failure.
Restore permissions.
Update the ViewFS mount‑table.
FastCopy workflow:
Client queries the source NameNode for block locations.
Target NameNode creates the destination file and block entries.
A copyBlock request is sent to the owning DataNode.
The DataNode creates a hard link to the existing block file.
The DataNode reports the new block to the target NameNode, completing the migration.
Integration with DistCp
FastCopy logic was embedded into the Hadoop DistCp tool, adding:
Uniform handling of large and small files to avoid long‑tail tasks.
ACL preservation before copy.
Support for whitelist / exclude lists.
Job‑parameter tuning for optimal parallelism.
These changes yielded up to a 7× speedup for bulk data moves.
Router‑Based Federation (RBF)
In 2021 eBay migrated to the community Router‑Based Federation architecture. A stateless Router service sits between clients and the set of federated NameNodes, providing:
Transparent namespace routing – clients see a single logical namespace.
Cloud‑native deployment and horizontal scalability.
Foundation for data‑split strategies built on the Router layer.
RBF Request Flow
Clients send all HDFS RPCs to the Router. The Router forwards metadata‑related calls to the appropriate NameNode(s), aggregates responses, and returns a unified result to the client.
Secure Token Workflow (YARN RM security)
Client obtains a delegation token from the source NameNode.
Client submits a YARN job to the ResourceManager (RM) carrying the token.
RM requests a Router‑level token and distributes it to all worker nodes.
Worker nodes use the Router token for HDFS authentication.
After job completion, RM revokes the token.
RBF Performance Enhancements
Increased Router RPC throughput; removed SASL encryption between Router and NameNode to reduce latency.
Preserved client IP and clientId in RPCs, maintaining data‑locality awareness for YARN tasks.
Fixed moveToTrash semantics in multi‑mount scenarios.
Future Work
Planned improvements for the RBF layer include:
Asynchronous RPC handling inside the Router to further boost throughput.
Automated data‑split generation based on namespace usage patterns.
Tiered storage integration to move cold data to cheaper media.
Isolation of inter‑namespace RPC traffic for better security and performance.
Key Diagrams
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
