Tagged articles
20 articles
Page 1 of 1
IT Services Circle
IT Services Circle
Feb 9, 2025 · Big Data

Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability

This article explains how HDFS, the Hadoop Distributed File System, splits large files into blocks, replicates them for fault tolerance, organizes the cluster into NameNode and DataNode components, and provides high‑availability and scalability mechanisms such as standby NameNode and federation, enabling reliable big‑data storage and access.

Big DataDataNodeDistributed File System
0 likes · 11 min read
Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability
Bilibili Tech
Bilibili Tech
Apr 26, 2024 · Big Data

Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance

To overcome the NameNode write bottleneck caused by a single global read/write lock in Bilibili’s massive HDFS deployment, the team introduced hierarchical fine‑grained locking—splitting the lock into Namespace, BlockPool, and per‑INode levels—which yielded up to three‑fold write throughput gains, a 90 % drop in RPC queue time, and shifted performance limits from lock contention to log synchronization.

Big DataHDFSNameNode
0 likes · 15 min read
Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance
Programmer DD
Programmer DD
Apr 14, 2021 · Big Data

Understanding HDFS Architecture: Key Components, Protocols, and Limitations

This article explains HDFS’s master‑slave architecture, detailing the roles of NameNode and DataNode, namespace management, communication protocols, client functions, common configuration parameters, maintenance commands, and the inherent limitations of a single‑NameNode design.

Big DataConfigurationDataNode
0 likes · 5 min read
Understanding HDFS Architecture: Key Components, Protocols, and Limitations
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 27, 2020 · Big Data

Understanding and Solving the Small File Problem in Big Data Systems

This article examines the pervasive small‑file issue in big‑data environments, explains its impact on storage and processing performance, and presents a comprehensive set of solutions—including file merging, Hadoop archives, SequenceFiles, HBase, CombineFileInputFormat, and Spark/Flink strategies—to mitigate metadata overhead and improve I/O efficiency.

FlinkHadoopNameNode
0 likes · 41 min read
Understanding and Solving the Small File Problem in Big Data Systems
Sohu Tech Products
Sohu Tech Products
Mar 4, 2020 · Big Data

Introduction to HDFS: Architecture, Components, and Operations

This article provides a comprehensive overview of HDFS, covering its role as a distributed file system, the concepts of blocks, NameNode and DataNode responsibilities, replication, edit logs, snapshots, high‑availability mechanisms, and practical considerations for managing large‑scale data storage.

DataNodeDistributed File SystemHDFS
0 likes · 11 min read
Introduction to HDFS: Architecture, Components, and Operations
DataFunTalk
DataFunTalk
Jan 2, 2020 · Big Data

ByteDance’s HDFS Architecture and Evolution: Design, Challenges, and Optimizations

This article presents an in‑depth overview of ByteDance’s large‑scale HDFS deployment, describing its unique access layer, metadata and data layers, the evolution through multiple growth stages, and the key architectural improvements such as NNProxy, DanceNN, lock redesign, startup acceleration, and slow‑node mitigation techniques.

Big DataByteDanceFederation
0 likes · 18 min read
ByteDance’s HDFS Architecture and Evolution: Design, Challenges, and Optimizations
dbaplus Community
dbaplus Community
Oct 28, 2019 · Big Data

Quickly Analyze Hadoop NameNode RPC with ELK and Grafana

This guide shows how to reduce excessive NameNode RPC calls caused by frequent HDFS directory listings and demonstrates a complete ELK pipeline—Filebeat, Kafka/Logstash, Elasticsearch, and Kibana—plus Grafana dashboards for real‑time monitoring of Hadoop RPC operations.

ELKGrafanaHadoop
0 likes · 9 min read
Quickly Analyze Hadoop NameNode RPC with ELK and Grafana
Beike Product & Technology
Beike Product & Technology
Jun 28, 2019 · Big Data

Hadoop NameNode Performance Bottlenecks and Solutions: Federation, ViewFS, FastCopy, Balance & Mover

This article analyzes the performance and stability bottlenecks of a Hadoop 2.7.3 NameNode caused by memory limits, RPC QPS, and long restart times, and presents a comprehensive solution stack—including HDFS federation, ViewFS, FastCopy, and tuned Balance/Mover tools—to improve scalability and reduce downtime.

BalanceFastCopyFederation
0 likes · 11 min read
Hadoop NameNode Performance Bottlenecks and Solutions: Federation, ViewFS, FastCopy, Balance & Mover
Meituan Technology Team
Meituan Technology Team
Mar 17, 2017 · Big Data

Optimizing Hadoop NameNode Restart in HA with QJM

By applying a series of JIRA patches and configuration tweaks—such as shrinking the fsLock scope, increasing checkpoint transaction thresholds, off‑loading quota calculations, simplifying BlockReport handling, and async processing of mis‑replicated blocks—the Hadoop HA NameNode restart time in a 540 MB metadata cluster drops from roughly 4000 seconds to about 2000 seconds, cutting total downtime to around 35 minutes and greatly improving cluster availability.

HAHDFSHadoop
0 likes · 18 min read
Optimizing Hadoop NameNode Restart in HA with QJM
Meituan Technology Team
Meituan Technology Team
Dec 9, 2016 · Big Data

Memory Usage Analysis of HDFS NameNode Core Data Structures

The article quantitatively breaks down HDFS NameNode memory consumption, showing that the Namespace tree and BlocksMap together dominate heap usage (≈53 GB in large clusters), provides detailed per‑object size estimates for NetworkTopology, INode and block structures, and proposes a simple formula to predict total heap requirements and tuning recommendations.

Big DataHDFSMemory Management
0 likes · 13 min read
Memory Usage Analysis of HDFS NameNode Core Data Structures
Meituan Technology Team
Meituan Technology Team
Aug 26, 2016 · Big Data

Memory Architecture and Analysis of Hadoop HDFS NameNode

The article dissects Hadoop 2.4.1’s HDFS NameNode memory architecture, detailing how the Namespace, BlockManager, NetworkTopology, and LeaseManager consume the heap, exposing scaling problems when metadata reaches hundreds of millions of inodes and blocks, and recommending file merging, block‑size tuning, federation, or external KV stores to mitigate heap pressure.

Big DataHDFSMemory Management
0 likes · 17 min read
Memory Architecture and Analysis of Hadoop HDFS NameNode
Qunar Tech Salon
Qunar Tech Salon
May 13, 2016 · Big Data

Overview and Architecture of Hadoop Distributed File System (HDFS)

This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), detailing its design goals, architecture components such as NameNode, DataNode and SecondaryNameNode, data block handling, replication strategies, communication protocols, and the read, write, and delete processes.

Big DataDistributed File SystemHDFS
0 likes · 18 min read
Overview and Architecture of Hadoop Distributed File System (HDFS)
ITPUB
ITPUB
Mar 19, 2016 · Big Data

Inside HDFS: How NameNode and DataNode Manage Big Data Writes and Reads

This article explains the fundamentals of distributed file systems, focusing on Hadoop’s HDFS architecture, the separation of metadata and data via NameNode and DataNode, and detailed step‑by‑step write and read processes, including replication, fault recovery, and block splitting across nodes.

Big DataDataNodeDistributed File System
0 likes · 8 min read
Inside HDFS: How NameNode and DataNode Manage Big Data Writes and Reads

Design and Implementation of Alibaba Cloud's Cross‑Data‑Center Hadoop Cluster

In 2013 Alibaba Cloud faced full rack capacity in a single IDC, prompting the development of a multi‑NameNode, cross‑data‑center Hadoop solution that overcomes NameNode scalability, inter‑site bandwidth limits, data placement, job scheduling, massive data migration, and user transparency challenges.

Cross‑Data‑CenterFederationHadoop
0 likes · 14 min read
Design and Implementation of Alibaba Cloud's Cross‑Data‑Center Hadoop Cluster

Design Principles and Architecture of HDFS (Hadoop Distributed File System)

This article explains HDFS's design goals, master/slave architecture, namespace management, block replication strategies, fault tolerance mechanisms, metadata persistence, communication protocols, robustness features, data organization, access methods, and space reclamation, providing a comprehensive overview of Hadoop's distributed storage system.

DataNodeHDFSNameNode
0 likes · 20 min read
Design Principles and Architecture of HDFS (Hadoop Distributed File System)