Big Data 18 min read

ByteDance’s HDFS Architecture and Evolution: Design, Challenges, and Optimizations

This article presents an in‑depth overview of ByteDance’s large‑scale HDFS deployment, describing its unique access layer, metadata and data layers, the evolution through multiple growth stages, and the key architectural improvements such as NNProxy, DanceNN, lock redesign, startup acceleration, and slow‑node mitigation techniques.

DataFunTalk
DataFunTalk
DataFunTalk
ByteDance’s HDFS Architecture and Evolution: Design, Challenges, and Optimizations

ByteDance’s "ByteDance Infrastructure Practice" series shares the practical experiences and lessons learned while scaling its Hadoop Distributed File System (HDFS) to support the company’s massive data and compute workloads.

HDFS, an open‑source distributed file system originally inspired by Google’s GFS, provides a familiar hierarchical namespace, append‑only writes, sequential and random reads, massive scalability, and high fault tolerance. ByteDance has extended the vanilla design with a dedicated Access Layer that aggregates many NameNode instances (via federation) and offers unified request routing, quota enforcement, tracing, and traffic‑shaping capabilities.

The Metadata Layer consists of NameNode, ZKFC, and BookKeeper (used instead of QJM for more stable multi‑node synchronization). To overcome the single‑NameNode scalability bottleneck, ByteDance introduced federation, allowing multiple independent NameNode groups to share DataNodes while preserving a unified namespace through the Access Layer.

The Data Layer is composed of DataNodes that store file blocks with replication for durability. DataNodes periodically heartbeat to NameNodes, reporting block information and receiving commands such as block replication or deletion.

ByteDance’s HDFS serves core business services including Hive, HBase, log services, Kafka storage, Yarn, Flink, Spark, and MapReduce, handling petabyte‑ to exabyte‑scale data across tens of thousands of servers.

Development progressed through four stages: (1) rapid cluster growth leading to NameNode bottlenecks and the introduction of federation; (2) scaling challenges caused by GC pressure and lock contention, addressed by a C++‑based DanceNN implementation; (3) further expansion to EB‑scale, prompting multi‑tenant support, data‑node redesign, and slow‑node mitigation; (4) ongoing optimizations.

Key improvements include:

NNProxy (NameNode Proxy) : Provides a unified metadata view in a federated environment, manages mount‑table routing, enforces per‑path and per‑user quotas, integrates ByteTrace for request tracing, and implements traffic‑shaping to protect backend NameNodes.

DanceNN : A C++ rewrite of the NameNode that eliminates Java GC issues, reduces memory footprint, and introduces a hierarchical lock system that achieves ~80k read QPS and ~20k write QPS—over ten times the performance of the original Java implementation.

Lock Redesign : Replaces a single global lock with a tree‑structured lock that allows concurrent operations on different sub‑trees while preserving ACID properties.

Startup Optimizations : Multi‑threaded scanning of the static namespace into a BlockMap, buffered processing, and parallel block‑report handling to accelerate safe‑mode exit.

Slow‑Node Optimizations : For reads, a fast‑switch mechanism based on throughput thresholds replaces packet‑level timeouts; for writes, Fast Failover and Fast Failover+ detect slow DataNodes, truncate affected pipelines, and recover by reallocating blocks, achieving sub‑200 ms recovery latency.

These enhancements have enabled ByteDance’s HDFS to grow from a few hundred nodes handling petabytes to a multi‑ten‑thousand‑node platform supporting exabytes of data, while maintaining high availability and performance.

In conclusion, the evolution of ByteDance’s HDFS demonstrates a continuous cycle of problem identification, architectural innovation, and performance tuning that keeps pace with the company’s rapid business growth.

Performance OptimizationBig DataDistributed StorageHDFSNameNodeFederationByteDance
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.