Kuaishou's HDFS Architecture, Scale, Challenges, and Practices
This article presents an in‑depth technical overview of Kuaishou's massive HDFS deployment, detailing its architecture, petabyte‑scale data and thousands‑of‑node clusters, the key scalability challenges faced, and the custom solutions—including FixedOrder, RBF balancer, observer read, slow‑node mitigation, and tiered protection—implemented to keep the system performant and reliable.
Kuaishou operates the largest internal distributed file system based on Hadoop Distributed File System (HDFS), which has grown alongside its rapidly expanding business. The system now spans tens of thousands of servers, stores several exabytes of data, and handles hundreds of petabytes of daily throughput.
HDFS Architecture Overview
HDFS follows a classic master‑slave design with a NameNode (metadata service) and multiple DataNodes (block storage). The NameNode stores directory trees, file block mappings, and block locations, while DataNodes hold the actual data blocks.
Scale at Kuaishou
Thousands of servers (tens of thousands in total)
Multiple exabytes of stored data and hundreds of petabytes of daily throughput
Hundreds of NameService groups and millions of QPS
Key Challenges and Practices
Master‑node scalability : Kuaishou introduced a FixedOrder mount point and an RBF‑Balancer mechanism to quickly add new NameService instances and distribute load without data migration.
Single‑NameService performance bottleneck : A custom observer‑read architecture built on the latest RBF framework provides transparent read‑write separation, dynamic routing, and supports standby/active/observer nodes for read requests.
Slow‑node issues : Implemented pre‑emptive avoidance (ranking DataNodes by load) and mid‑operation circuit‑breakers, along with DN‑side optimizations such as directory hierarchy flattening, finer‑grained locking, DU operation improvements, and load‑aware disk selection.
Tiered protection : Introduced priority queues in NameNode RPC handling and per‑disk throttlers in DataNodes, allowing high‑priority tasks to pre‑empt resources when the cluster is saturated.
Final Architecture
The production architecture consists of three layers: a stateless routing layer (Routers sharing mount‑point info via ZooKeeper), a metadata layer (multiple independent NameService groups with Active, Standby, and Observer NameNodes), and a data layer (large pools of DataNodes reporting heartbeats and block reports).
Conclusion
Over three years, Kuaishou's HDFS evolved from a few hundred nodes handling petabyte‑scale data to a massive cluster of tens of thousands of nodes supporting exabyte‑scale storage. The team continuously refines the system to meet explosive data growth, sharing many bug fixes and performance improvements with the open‑source community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
