Tagged articles

HDFS

192 articles · Page 2 of 2

Jul 17, 2020 · Big Data

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

This article introduces Qbus, a cloud‑based queue service built on Kafka, covering its architecture, core components such as log collection, SDKs, HDFS persistence, monitoring with Prometheus, business integration methods, use‑case scenarios, and future development directions.

Cloud QueueHDFSKafka

0 likes · 6 min read

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

Tencent Cloud Developer

Jul 13, 2020 · Big Data

Building MVP: A Lightweight Big Data Analysis System for Product Growth

The article describes how a lightweight big‑data analysis platform called MVP was built from scratch—using a User‑Event‑Config model, HDFS + ClickHouse + Spark, and four modules for metric monitoring, root‑cause alerts, deep growth analysis, and A/B testing—enabling real‑time insights in seconds instead of days and dramatically accelerating product‑growth operations.

AARRR ModelClickHouseHDFS

0 likes · 9 min read

Building MVP: A Lightweight Big Data Analysis System for Product Growth

Big Data Technology & Architecture

Jul 13, 2020 · Big Data

Write Ahead Log (WAL) Mechanism and Its Application in Distributed Storage Systems

The article explains how Write Ahead Log (WAL) improves metadata persistence and disaster recovery in distributed storage systems such as HDFS by buffering changes, reducing synchronous database writes, and providing checkpoint and recovery mechanisms, while also discussing practical control options.

Distributed storageHDFSMetadata Persistence

0 likes · 5 min read

Write Ahead Log (WAL) Mechanism and Its Application in Distributed Storage Systems

Big Data Technology & Architecture

May 13, 2020 · Big Data

Analysis of Hadoop HDFS Data Read and Write Process

This article explains the underlying principles of Hadoop HDFS read and write operations, detailing how the client interacts with NameNode and DataNodes, the role of FsDataInputStream and FsDataOutputStream, block location retrieval, pipeline replication, and file closure steps.

Big DataData ReadData Write

0 likes · 8 min read

Analysis of Hadoop HDFS Data Read and Write Process

Big Data Technology Architecture

May 6, 2020 · Big Data

Ozone vs HDFS: Why Ozone Cannot Replace Hadoop’s Core Storage

In this article, senior Alibaba engineer Zheng Kai analyzes Ozone’s role in the Hadoop ecosystem, arguing that despite its usefulness, Ozone cannot solve Hadoop’s core challenges of complexity, cost, and performance, and that Hadoop must focus on storage innovation, compute‑storage separation, and cloud integration to stay relevant.

CloudHDFSHadoop

0 likes · 14 min read

Ozone vs HDFS: Why Ozone Cannot Replace Hadoop’s Core Storage

Big Data Technology & Architecture

May 6, 2020 · Big Data

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

This article provides a comprehensive, hands‑on tutorial for preparing three VMs, installing JDK and Hadoop, configuring core‑site.xml, hdfs‑site.xml, mapred‑site.xml, yarn‑site.xml, setting environment variables, distributing the package, starting HDFS and YARN, and verifying the cluster via web UI and jps commands.

Big DataCluster SetupHDFS

0 likes · 14 min read

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

Big Data Technology Architecture

Apr 20, 2020 · Big Data

Introduction to HDFS: Architecture, Features, Replication, Rack Awareness, and Metadata Management

This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), covering its streaming data access model, key characteristics, master‑slave architecture, block storage and replication mechanisms, rack‑aware placement strategy, and how the NameNode manages metadata and checkpoints.

Distributed File SystemHDFSHadoop

0 likes · 7 min read

Introduction to HDFS: Architecture, Features, Replication, Rack Awareness, and Metadata Management

dbaplus Community

Apr 15, 2020 · Big Data

How Ctrip Scaled Hadoop Across Data Centers: Architecture and Lessons

This article details Ctrip's Hadoop evolution, the challenges of expanding across multiple data centers, the evaluation of multi‑cluster versus single‑cluster designs, and the concrete architectural changes, migration tools, bandwidth monitoring, and future plans that enabled a stable cross‑datacenter big‑data platform.

Big DataCross-DataCenterDistributed storage

0 likes · 19 min read

How Ctrip Scaled Hadoop Across Data Centers: Architecture and Lessons

Big Data Technology & Architecture

Apr 15, 2020 · Big Data

Understanding HDFS SecondaryNameNode and the Checkpoint Process

This article explains the role of HDFS SecondaryNameNode, the structure of fsimage and edits files, how checkpointing works—including configuration parameters and steps—and how the process changes when NameNode high availability is enabled.

Big DataCheckpointFilesystem

0 likes · 6 min read

Understanding HDFS SecondaryNameNode and the Checkpoint Process

Youzan Coder

Apr 1, 2020 · Big Data

Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

The article outlines Presto’s high‑performance, coordinator‑worker architecture and query flow, describes YouZan’s migration from mixed Hadoop deployment to dedicated low‑latency clusters, details challenges such as small‑file handling and regex backtracking with their fixes, and previews future enhancements like Alluxio integration, session property managers, and Ranger‑based multi‑tenant isolation.

Distributed ComputingFacebookHDFS

0 likes · 14 min read

Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

Open Source Linux

Mar 12, 2020 · Big Data

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

This tutorial walks you through setting up a three‑node Hadoop 2.9.2 cluster on CentOS 7.5, covering environment preparation, password‑less SSH, user creation, JDK installation, Hadoop extraction, configuration file edits, directory setup, ownership changes, service startup, and verification via web UIs.

Big DataCentOSCluster Setup

0 likes · 13 min read

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

Sohu Tech Products

Mar 4, 2020 · Big Data

Introduction to HDFS: Architecture, Components, and Operations

This article provides a comprehensive overview of HDFS, covering its role as a distributed file system, the concepts of blocks, NameNode and DataNode responsibilities, replication, edit logs, snapshots, high‑availability mechanisms, and practical considerations for managing large‑scale data storage.

DataNodeDistributed File SystemHDFS

0 likes · 11 min read

Introduction to HDFS: Architecture, Components, and Operations

Ctrip Technology

Feb 27, 2020 · Big Data

Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned

This article details Ctrip's cross‑datacenter Hadoop architecture, covering the evolution of its Hadoop platform, the challenges of multi‑site bandwidth and latency, design choices between multi‑cluster and single‑cluster solutions, and the concrete HDFS, YARN, balancer, migration, monitoring, and throttling implementations that enable transparent, consistent, and efficient multi‑datacenter operations.

Cross-DataCenterData MigrationHDFS

0 likes · 15 min read

Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned

dbaplus Community

Feb 25, 2020 · Backend Development

How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load

This article explains a small‑file‑merging technique for Apache Flink checkpoints that reuses FSDataOutputStreams to combine multiple state files into a single HDFS file, detailing design considerations such as concurrent checkpoint support, reference‑counted deletion, space amplification reduction, fault handling, compatibility, and observed production performance gains.

Apache FlinkCheckpointHDFS

0 likes · 13 min read

How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load

Big Data Technology & Architecture

Feb 16, 2020 · Big Data

Implementing MySQL Binlog Synchronization to HDFS Using Canal

This article details a step‑by‑step guide for deploying Canal to capture MySQL binlog events, configure HA with ZooKeeper, design a client that parses binlog into JSON, asynchronously acknowledges messages, archive data to local files for batch upload to HDFS, and monitor latency for alerts.

Big DataBinlogCanal

0 likes · 10 min read

Implementing MySQL Binlog Synchronization to HDFS Using Canal

Big Data Technology & Architecture

Jan 7, 2020 · Big Data

Why Small Files Are a Problem in Big Data and How Delta Lake Compaction Solves It

This article examines the root causes and performance impact of massive small-file proliferation in traditional data warehouses, explains why HDFS metadata limits scalability, and details how Delta Lake’s custom compaction process can safely merge these files for append-only tables without disrupting reads or writes.

CompactionDelta LakeHDFS

0 likes · 5 min read

Why Small Files Are a Problem in Big Data and How Delta Lake Compaction Solves It

Didi Tech

Jan 5, 2020 · Big Data

Rolling Upgrade of HDFS from 2.7 to 3.2: Experience, Issues and Solutions

The team performed a rolling upgrade of HDFS from 2.7 to 3.2 on large clusters, resolving EditLog, Fsimage, StringTable and authentication incompatibilities by omitting EC data, using fallback images, rolling back commits and first upgrading to the latest 2.x release, following a staged JournalNode‑NameNode‑DataNode procedure, validating with rehearsals and a custom trash‑management tool, and achieving uninterrupted service, improved stability, performance and cost efficiency.

Big DataCluster MigrationHDFS

0 likes · 11 min read

Rolling Upgrade of HDFS from 2.7 to 3.2: Experience, Issues and Solutions

DataFunTalk

Jan 2, 2020 · Big Data

ByteDance’s HDFS Architecture and Evolution: Design, Challenges, and Optimizations

This article presents an in‑depth overview of ByteDance’s large‑scale HDFS deployment, describing its unique access layer, metadata and data layers, the evolution through multiple growth stages, and the key architectural improvements such as NNProxy, DanceNN, lock redesign, startup acceleration, and slow‑node mitigation techniques.

Big DataByteDanceFederation

0 likes · 18 min read

ByteDance’s HDFS Architecture and Evolution: Design, Challenges, and Optimizations

Big Data Technology & Architecture

Dec 18, 2019 · Big Data

Data Consistency Strategies for Big Data Applications: Simple Replication, HDFS Pipeline, and Elasticsearch

The article explains three approaches to ensuring data consistency in big‑data systems—basic multi‑node replication, HDFS pipeline replication, and Elasticsearch primary‑replica replication—detailing their workflows, advantages, and drawbacks.

Data ConsistencyElasticsearchHDFS

0 likes · 4 min read

Data Consistency Strategies for Big Data Applications: Simple Replication, HDFS Pipeline, and Elasticsearch

Big Data Technology & Architecture

Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

Architects' Tech Alliance

Dec 2, 2019 · Cloud Computing

An Overview of EMC Elastic Cloud Storage (ECS): Architecture, Features, and Performance

This article provides a detailed technical overview of EMC's Elastic Cloud Storage (ECS), covering its historical evolution, layered architecture, supported protocols, data protection mechanisms, performance characteristics, limitations, and future roadmap within the context of cloud object storage.

Distributed storageECSElastic Cloud Storage

0 likes · 10 min read

An Overview of EMC Elastic Cloud Storage (ECS): Architecture, Features, and Performance

ITPUB

Dec 2, 2019 · Backend Development

How Xiaomi Built Talos: A Scalable, Stateless Message Queue for Billions of Events

This article details Xiaomi's journey from Kafka 0.8 to the home‑grown Talos system, covering business motivations, storage‑compute separation architecture, key challenges such as tail‑read and consistency, and extensive performance, resource, and platform optimizations that enable a high‑throughput, multi‑tenant messaging service.

Consistent HashingDistributed MessagingHDFS

0 likes · 16 min read

How Xiaomi Built Talos: A Scalable, Stateless Message Queue for Billions of Events

Big Data Technology & Architecture

Nov 4, 2019 · Big Data

Understanding Spark Checkpoint: Purpose, Mechanism, and Best Practices

This article explains why Spark checkpoints are needed for large or complex RDD pipelines, how they work by persisting data to reliable storage such as HDFS, and outlines practical steps and best‑practice recommendations for using checkpoints effectively in production environments.

Big DataCheckpointHDFS

0 likes · 6 min read

Understanding Spark Checkpoint: Purpose, Mechanism, and Best Practices

360 Tech Engineering

Sep 19, 2019 · Big Data

Understanding HDFS: Architecture, Read/Write Operations, Component Roles, Commands, and Pros & Cons

This article provides a comprehensive overview of HDFS, covering its purpose, architecture, read/write mechanisms, replication strategies, component responsibilities, common command‑line tools, and the advantages and disadvantages of using Hadoop Distributed File System for large‑scale data storage.

Distributed File SystemHDFSHadoop

0 likes · 10 min read

Understanding HDFS: Architecture, Read/Write Operations, Component Roles, Commands, and Pros & Cons

360 Tech Engineering

Aug 22, 2019 · Big Data

Design and Implementation of XStore: A Hadoop‑Based Sample Storage System

This article details the design, architecture, and operational experience of XStore, a Hadoop‑backed sample storage system that handles billions of APK and other binary samples, addressing functional and non‑functional requirements such as real‑time upload, large‑scale storage, high‑performance reads, and disaster recovery.

HBaseHDFSHadoop

0 likes · 11 min read

Design and Implementation of XStore: A Hadoop‑Based Sample Storage System

360 Zhihui Cloud Developer

Aug 22, 2019 · Big Data

Mastering HDFS: Architecture, Read/Write, and Best Practices Explained

This article provides a comprehensive overview of HDFS, covering its purpose, architecture, read/write processes, component roles, command-line tools, replica placement strategies, and the advantages and disadvantages of using Hadoop's distributed file system for large-scale data storage.

Data ReplicationDistributed File SystemHDFS

0 likes · 11 min read

Mastering HDFS: Architecture, Read/Write, and Best Practices Explained

21CTO

Jun 28, 2019 · Big Data

Master Hadoop High Availability: A Complete Step‑by‑Step HA HDFS & YARN Guide

This article provides a comprehensive, language‑agnostic tutorial on building a highly available Hadoop cluster, covering HDFS and YARN HA architectures, QJM shared storage, required components, configuration files, installation commands, startup procedures, verification steps, and troubleshooting references.

Cluster SetupHDFSHadoop

0 likes · 20 min read

Master Hadoop High Availability: A Complete Step‑by‑Step HA HDFS & YARN Guide

Architecture Digest

Jun 26, 2019 · Big Data

Guide to Setting Up Hadoop High Availability (HA) Cluster with HDFS and YARN

This article provides a step‑by‑step tutorial on configuring Hadoop high availability, covering HDFS HA architecture, Quorum Journal Manager synchronization, NameNode failover, YARN HA, required pre‑conditions, cluster planning, configuration files, service startup, and verification procedures.

Big DataCluster SetupHDFS

0 likes · 16 min read

Guide to Setting Up Hadoop High Availability (HA) Cluster with HDFS and YARN

Big Data Technology Architecture

May 21, 2019 · Databases

Postmortem Analysis of a 10‑Node HBase Cluster Outage and Mitigation Measures

This article presents a detailed post‑mortem of a 10‑node HBase cluster failure caused by excessive region count and memstore pressure, analyzes HDFS and datanode log errors, and outlines configuration adjustments and operational recommendations that restored the service and prevented future outages.

Cluster OutageCompactionHBase

0 likes · 16 min read

Postmortem Analysis of a 10‑Node HBase Cluster Outage and Mitigation Measures

Big Data Technology Architecture

May 18, 2019 · Big Data

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

This article explains Kafka message structure and offset retrieval, details Hadoop's map and reduce shuffle processes, outlines Spark's deployment modes, describes HDFS read/write mechanisms, compares reduceByKey and groupByKey performance, and discusses Spark streaming integration with Kafka and data loss prevention.

HDFSHadoopKafka

0 likes · 10 min read

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

Qunar Tech Salon

May 16, 2019 · Big Data

Optimizing HDFS Federation Data Migration with FastCopy and qFastCopy at Qunar

This article describes the challenges of scaling Qunar's Hadoop NameNode, introduces HDFS Federation and the FastCopy tool, presents performance tests comparing FastCopy with DistCp, and details the development and evaluation of an optimized qFastCopy solution that reduces multi‑petabyte migration time from hours to a few.

Big DataData MigrationFastCopy

0 likes · 8 min read

Optimizing HDFS Federation Data Migration with FastCopy and qFastCopy at Qunar

Big Data Technology Architecture

May 15, 2019 · Big Data

Understanding HDFS: Blocks, Packets, Chunks, and Read/Write Processes

This article explains the core concepts of HDFS—including its block, packet, and chunk structures, their roles in data streaming, the detailed write and read workflows, and how checksums ensure data integrity—providing a comprehensive overview for anyone working with Hadoop distributed storage.

Distributed File SystemHDFSblock storage

0 likes · 7 min read

Understanding HDFS: Blocks, Packets, Chunks, and Read/Write Processes

dbaplus Community

May 13, 2019 · Big Data

Tackling HDFS Performance Bottlenecks: Real‑World Optimizations from VIP.com

This article examines the performance challenges encountered after upgrading a large‑scale HDFS cluster at VIP.com, explains the root causes of NameNode RPC latency, and presents concrete solutions—including delayed block reports, configurable block deletion, federation redesign, client monitoring, temp‑directory sharding, and small‑file handling—along with configuration snippets and real‑world results.

Big DataFederationHDFS

0 likes · 13 min read

Tackling HDFS Performance Bottlenecks: Real‑World Optimizations from VIP.com

dbaplus Community

Apr 25, 2019 · Big Data

Cutting Hadoop Storage Costs: Replication, Compression, Tiering & Erasure Coding

This article shares practical strategies used in a multi‑petabyte Hadoop environment to slash storage expenses, covering reduced replication, selective compression formats, tiered storage policies, and erasure coding, while weighing trade‑offs in reliability, performance, and operational complexity.

HDFSHadoopStorage Optimization

0 likes · 10 min read

Cutting Hadoop Storage Costs: Replication, Compression, Tiering & Erasure Coding

Big Data Technology & Architecture

Apr 12, 2019 · Big Data

Weekly Knowledge Summary: Yarn Resource Scheduler, Hadoop Rack Awareness, HDFS Data Flow, and Small File Solutions

This weekly note shares personal updates and a concise technical overview covering Yarn's resource scheduling, Hadoop's rack‑aware architecture, HDFS data flow, and practical solutions to the HDFS small‑file problem, along with links to further reading and upcoming work plans.

Big DataHDFSHadoop

0 likes · 5 min read

Weekly Knowledge Summary: Yarn Resource Scheduler, Hadoop Rack Awareness, HDFS Data Flow, and Small File Solutions

Big Data Technology & Architecture

Apr 8, 2019 · Big Data

Understanding HDFS Data Blocks, Rack Awareness, and Dynamic Node Addition

This article explains how HDFS stores files in replicated data blocks, implements rack awareness to improve reliability and performance, shows the necessary configuration in core-site.xml, provides sample scripts, and demonstrates how to add new DataNode machines without restarting the NameNode.

Big DataData BlockDynamic Node Addition

0 likes · 10 min read

Understanding HDFS Data Blocks, Rack Awareness, and Dynamic Node Addition

Big Data Technology & Architecture

Apr 4, 2019 · Big Data

Weekly Knowledge Points: Interview Reflections, Hadoop Introduction, MapReduce and HDFS Overview

This weekly briefing shares five curated resources covering interview reflections, a concise Hadoop introduction, the principles of MapReduce, an overview of HDFS, and upcoming plans to study Hive and HBase, emphasizing the distributed nature of big‑data processing.

Big DataHDFSHadoop

0 likes · 3 min read

Weekly Knowledge Points: Interview Reflections, Hadoop Introduction, MapReduce and HDFS Overview

Big Data Technology & Architecture

Apr 3, 2019 · Big Data

Understanding RAID and Its Role in HDFS Architecture

This article explains the storage challenges of big data, introduces RAID technologies and their variants, and shows how the principles of RAID are applied in the Hadoop Distributed File System (HDFS) to achieve scalable, reliable, and high‑performance data storage and processing.

Big DataData ReplicationHDFS

0 likes · 10 min read

Understanding RAID and Its Role in HDFS Architecture

Big Data Technology & Architecture

Apr 1, 2019 · Big Data

Comprehensive Overview of Hadoop: Core Modules, HDFS Architecture, MapReduce, YARN, and a Scala WordCount Example

This article provides a detailed introduction to Hadoop's ecosystem—including its core modules (Common, HDFS, YARN, MapReduce), the design of a high‑availability HDFS cluster, the principles of distributed file systems, and a complete Scala WordCount MapReduce program—offering a solid foundation for big‑data practitioners.

Big DataHDFSHadoop

0 likes · 15 min read

Comprehensive Overview of Hadoop: Core Modules, HDFS Architecture, MapReduce, YARN, and a Scala WordCount Example

Architects' Tech Alliance

Mar 18, 2019 · Big Data

Understanding HDFS Architecture, NameNode HA, and Read/Write Processes

This article explains the concepts and architecture of HDFS, the high‑availability mechanisms of NameNode including quorum‑based shared storage, the detailed read and write workflows of the distributed file system, and discusses its typical use cases and limitations.

Big DataHAHDFS

0 likes · 16 min read

Understanding HDFS Architecture, NameNode HA, and Read/Write Processes

Youzan Coder

Mar 1, 2019 · Big Data

Flume Practice at YouZan: Data Collection and Pipeline Construction in Big Data Scenarios

YouZan’s experience with Flume shows how the at‑least‑once delivery model, combined with FileChannel storage and custom extensions such as an NsqSource, hourly‑based HdfsEventSink, metric reporting server, and timestamp interceptor, can reliably move MySQL binlog data to HDFS, while tuning transaction batch size and channel capacity boosts throughput and stability, paving the way for a unified management platform.

At-Least-OnceFlumeHDFS

0 likes · 11 min read

Flume Practice at YouZan: Data Collection and Pipeline Construction in Big Data Scenarios

JD Tech

Mar 1, 2019 · Big Data

JD's JDK Customization and Optimization for HDFS: Experience, Challenges, and Future Directions

This article outlines JD's attempts and explorations in customizing the JDK for HDFS, describing the background, limitations of Oracle JDK 1.8, the adoption of OpenJDK 11 with G1GC, a series of JVM and GC optimizations, performance results, and future development plans.

G1GCGC optimizationHDFS

0 likes · 16 min read

JD's JDK Customization and Optimization for HDFS: Experience, Challenges, and Future Directions

dbaplus Community

Feb 19, 2019 · Big Data

Mastering HDFS Monitoring on JD Cloud: Key Metrics, Tools, and Best Practices

This article presents a comprehensive guide to monitoring Hadoop Distributed File System (HDFS) on JD Cloud, covering challenges, recommended toolchains, essential metrics, configuration tips, and real‑world case studies to help engineers ensure reliability and performance of large‑scale data clusters.

Big DataELKHDFS

0 likes · 14 min read

Mastering HDFS Monitoring on JD Cloud: Key Metrics, Tools, and Best Practices

Didi Tech

Jan 31, 2019 · Big Data

Router-Based Federation in Hadoop: Architecture, Components, and Didi’s Deployment

Router‑Based Federation replaces Hadoop’s single‑point HDFS bottleneck with a server‑side global namespace managed by Routers and a State Store, enabling scalable, highly available sub‑clusters; Didi back‑ported the feature, deployed five Routers, fixed numerous bugs, and contributed patches to improve stability and functionality.

Big DataHDFSHadoop

0 likes · 11 min read

Router-Based Federation in Hadoop: Architecture, Components, and Didi’s Deployment

Programmer DD

Nov 18, 2018 · Databases

How We Optimized HBase for 80 Billion Daily Logs: Real‑World Tuning Strategies

This article details the practical performance‑tuning steps applied to a large‑scale HBase deployment handling 80 billion daily log entries, covering rowkey redesign, region redistribution, HDFS write‑timeout fixes, network‑topology adjustments, and JVM parameter tweaks that together stabilized the system and dramatically improved throughput.

HBaseHDFSPerformance Tuning

0 likes · 14 min read

How We Optimized HBase for 80 Billion Daily Logs: Real‑World Tuning Strategies

JD Tech

Sep 20, 2018 · Big Data

Optimizing Local Storage Systems for Large‑Scale Hadoop HDFS Clusters

This article explains the architecture of Hadoop HDFS, identifies performance bottlenecks in page cache and metadata handling on DataNodes, and presents four practical optimization techniques—including cache‑buffer separation, barrier disabling, directory restructuring, and real‑time monitoring—demonstrating significant throughput and latency improvements in large‑scale clusters.

HDFSHadoopLinux kernel

0 likes · 14 min read

Optimizing Local Storage Systems for Large‑Scale Hadoop HDFS Clusters

Tongcheng Travel Technology Center

Aug 14, 2018 · Big Data

Understanding HDFS Read and Write Mechanisms

This article explains how HDFS handles file reading and writing, detailing the roles of DFSClient, block selection, hedged reads, packet construction, checksum handling, and the interaction with NameNode and DataNode pipelines to ensure reliability and performance.

DFSClientDistributed File SystemHDFS

0 likes · 7 min read

Understanding HDFS Read and Write Mechanisms

dbaplus Community

Aug 6, 2018 · Big Data

Understanding RAID, HDFS, and MapReduce: From Storage to Distributed Computing

This article explains the storage challenges of big data, introduces RAID levels and their trade‑offs, describes the HDFS architecture with NameNode and DataNode replication, details the MapReduce programming model and execution flow, and shows how Hive translates SQL queries into MapReduce jobs.

Big DataDistributed ComputingHDFS

0 likes · 23 min read

Understanding RAID, HDFS, and MapReduce: From Storage to Distributed Computing

Big Data and Microservices

Jul 24, 2018 · Big Data

Why Hadoop Still Leads Big Data Processing: Core Advantages Explained

This article introduces Hadoop’s open‑source big‑data framework, explains its core components HDFS and MapReduce, and outlines four key advantages—ease of deployment, robustness, scalability, and simplicity—while also covering HBase as the Hadoop‑based column‑oriented database.

Big DataDistributed ComputingHBase

0 likes · 4 min read

Why Hadoop Still Leads Big Data Processing: Core Advantages Explained

JD Tech

Jul 10, 2018 · Big Data

Deploying Hadoop KMS for Transparent HDFS Encryption: A Step‑by‑Step Guide

This article details a complete, hands‑on deployment of Hadoop KMS on a CentOS‑based Hadoop 2.6.1 cluster, covering environment setup, configuration file changes, key generation, service startup, encryption‑zone creation, user permission tuning, verification procedures, and common troubleshooting tips.

HDFSHadoopKMS

0 likes · 19 min read

Deploying Hadoop KMS for Transparent HDFS Encryption: A Step‑by‑Step Guide

dbaplus Community

Jun 7, 2018 · Operations

Why Ceph’s Unlimited Scalability Isn’t As Simple As It Looks

The article examines Ceph’s claimed infinite scalability, cost advantages, and operational stability from an SRE perspective, comparing it with centralized systems like HDFS, and reveals practical challenges such as expansion granularity, crushmap rebalancing, utilization limits, and maintenance overhead.

CephDistributed storageHDFS

0 likes · 15 min read

Why Ceph’s Unlimited Scalability Isn’t As Simple As It Looks

UCloud Tech

May 22, 2018 · Big Data

Can Data Lakes Combine Compute and Storage? Exploring HDFS, S3A, and UMStor Hadapter

This article examines the evolution of data lake architectures, comparing the compute‑storage fusion model of HDFS, the compute‑storage separation approach of S3A on Ceph, and a new UMStor Hadapter plugin that aims to unite their strengths while addressing performance bottlenecks.

CephData LakeHDFS

0 likes · 14 min read

Can Data Lakes Combine Compute and Storage? Exploring HDFS, S3A, and UMStor Hadapter

Suning Technology

May 11, 2018 · Big Data

How Suning Scaled HDFS with Alluxio: Multi‑Cluster Architecture and Performance Gains

This article details Suning's approach to overcoming HDFS Namenode performance bottlenecks by partitioning into multiple clusters, leveraging Alluxio's unified namespace, and presenting design decisions, implementation challenges, and performance test results that show significant throughput and latency improvements.

AlluxioDistributed storageHDFS

0 likes · 12 min read

How Suning Scaled HDFS with Alluxio: Multi‑Cluster Architecture and Performance Gains

Qunar Tech Salon

Apr 17, 2018 · Big Data

HDFS DataNode Volume Choosing Policies: Round‑Robin and Available‑Space Strategies

This article explains how HDFS DataNode stores data blocks on local disks, detailing the configuration of storage directories, the two volume‑choosing policies (round‑robin and available‑space), their implementation via the VolumeChoosingPolicy interface, and the logic used to balance disk usage.

AvailableSpaceDiskBalancingHDFS

0 likes · 10 min read

HDFS DataNode Volume Choosing Policies: Round‑Robin and Available‑Space Strategies

dbaplus Community

Mar 7, 2018 · Big Data

Taming Massive HDFS Data Growth: Monitoring, Capacity Planning & Hive Optimization

The article outlines a systematic approach for large‑scale Hadoop clusters to monitor daily data growth, identify abnormal paths, manage rapid expansion, clean unused cold data, and implement capacity forecasts, while providing concrete daily and quarterly actions, Hive‑specific strategies, and practical examples to keep storage under control.

Big DataData GrowthHDFS

0 likes · 17 min read

Taming Massive HDFS Data Growth: Monitoring, Capacity Planning & Hive Optimization

Ctrip Technology

Feb 28, 2018 · Big Data

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

The article explains how Ctrip's big‑data platform introduced Alluxio to isolate real‑time Spark Streaming jobs from HDFS NameNode maintenance, reduce NameNode pressure, improve Spark SQL performance, and provide a unified storage layer across multiple HDFS clusters.

AlluxioBig DataData Lake

0 likes · 9 min read

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

dbaplus Community

Dec 14, 2017 · Big Data

Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

In 2017 Vipshop’s senior big‑data architect shares how the company grew its Hadoop‑based platform from zero to a thousand‑node cluster, detailing cluster health monitoring, multi‑HDFS deployment via Hive, Yarn container allocation improvements, and a hook‑driven Capping resource‑control system to boost stability and efficiency.

Big DataHDFSMonitoring

0 likes · 15 min read

Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

Full-Stack DevOps & Kubernetes

Oct 21, 2017 · Big Data

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

This guide walks through preparing three CentOS 6.9 nodes, configuring hostnames, time sync, password‑less SSH, disabling IPv6, installing JDK, downloading CDH 5.4, setting up core‑site and hdfs‑site XML files, formatting the NameNode, starting HDFS services, configuring YARN and MapReduce, and verifying the installations via the Web UI.

Big DataCDHCentOS

0 likes · 18 min read

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

Architecture Digest

Jul 21, 2017 · Big Data

Step-by-Step Guide to Building a High-Availability Hadoop HDFS and YARN Cluster

This article provides a comprehensive, step-by-step tutorial for setting up a high‑availability Hadoop cluster, covering user creation, JDK installation, host configuration, SSH setup, firewall and SELinux adjustments, Zookeeper deployment, HDFS and YARN HA configuration, essential XML files, and failover testing.

Cluster SetupHDFSHadoop

0 likes · 20 min read

Step-by-Step Guide to Building a High-Availability Hadoop HDFS and YARN Cluster

Architects' Tech Alliance

Jul 11, 2017 · Big Data

Understanding HDFS Architecture and Its Integration with NFS and Various Storage Solutions

This article reviews the fundamental concepts of HDFS, explains its master‑slave architecture with NameNode and DataNode, describes block replication, and discusses various implementations—including native HDFS, NetApp/Lustre, GPFS/Ceph, and Isilon—as well as HDFS‑to‑NFS gateway integration.

Big DataDistributed File SystemHDFS

0 likes · 7 min read

Understanding HDFS Architecture and Its Integration with NFS and Various Storage Solutions

StarRing Big Data Open Lab

Jun 9, 2017 · Big Data

Secure HDFS with Guardian 5.0: Complete Permission and Quota Guide

This article explains why Hadoop security is critical, introduces Guardian 5.0’s unified authentication and authorization framework, and provides step‑by‑step instructions for configuring HDFS permissions and quotas through its web UI, helping administrators protect massive data assets efficiently.

Guardian5.0HDFSHadoop

0 likes · 9 min read

Secure HDFS with Guardian 5.0: Complete Permission and Quota Guide

ITFLY8 Architecture Home

May 10, 2017 · Big Data

How Hadoop Implements Distributed File Systems: From GFS Theory to Practice

This article explains the fundamentals of distributed file systems by linking Google’s GFS, MapReduce, and BigTable concepts to Hadoop’s open‑source implementation, covering terminology, architecture, server roles, data distribution, RPC protocols, file operations, fault recovery, consistency, load balancing, and garbage collection.

GFSHDFSHadoop

0 likes · 34 min read

How Hadoop Implements Distributed File Systems: From GFS Theory to Practice

ITFLY8 Architecture Home

May 9, 2017 · Fundamentals

Exploring Popular Distributed File Systems: From GFS to FastDFS

This article surveys common distributed file systems such as GFS, HDFS, Lustre, Ceph, GridFS, MogileFS, TFS, and FastDFS, explaining their origins, key characteristics, typical use cases, and practical considerations for large‑scale storage.

CephGFSHDFS

0 likes · 7 min read

Exploring Popular Distributed File Systems: From GFS to FastDFS

MaGe Linux Operations

May 3, 2017 · Big Data

From Storage to Real‑Time: The Evolution of Big Data Technologies

This article outlines the three historical stages of big data technology—from early storage and batch processing, through market‑driven integration with Hive, to today’s focus on speed with Spark, Impala and streaming—while detailing the Hadoop ecosystem components such as HDFS, MapReduce, KV stores and emerging solutions like YDB.

HDFSHadoopHive

0 likes · 13 min read

From Storage to Real‑Time: The Evolution of Big Data Technologies

360 Quality & Efficiency

Apr 24, 2017 · Big Data

Introduction to Hadoop: Architecture, HDFS, MapReduce, and Common Commands

This article introduces Hadoop as a widely used big‑data framework, explains its core components HDFS and MapReduce, describes the cluster node roles, presents typical command‑line usage and a sample MapReduce workflow, and offers guidance for further learning.

Distributed ComputingHDFSHadoop

0 likes · 5 min read

Introduction to Hadoop: Architecture, HDFS, MapReduce, and Common Commands

Qunar Tech Salon

Apr 21, 2017 · Big Data

Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies

This article explains why Spark Streaming combined with Kafka can only guarantee at‑least‑once delivery, outlines the challenges of delayed and out‑of‑order events, and presents practical offline‑repair, deduplication, and output‑format techniques—including code examples—to achieve exact‑once semantics in big‑data pipelines.

Exact-OnceHBaseHDFS

0 likes · 11 min read

Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies

Meituan Technology Team

Apr 14, 2017 · Big Data

Practical Experience of HDFS Federation at Meituan: Challenges, Improvements, and Automation

Meituan‑Dianping migrated its 2,000‑node HDFS cluster to Federation by fixing ViewFs compatibility, simplifying mount points, leveraging FastCopy for massive data moves, improving token handling, and automating split‑workflow steps, thereby overcoming single‑NameNode bottlenecks and providing a practical blueprint for large‑scale Hadoop deployments.

Big DataFastCopyFederation

0 likes · 22 min read

Practical Experience of HDFS Federation at Meituan: Challenges, Improvements, and Automation

Java High-Performance Architecture

Mar 23, 2017 · Big Data

Master HDFS: From Basics to Hands‑On Java API and Shell Operations

This tutorial guides you through HDFS fundamentals, explaining its purpose and mechanisms, demonstrates command‑line and Java API operations, and walks you through the complete read/write workflow, while providing a ready‑to‑use practice environment for hands‑on learning.

Distributed File SystemHDFSHadoop

0 likes · 2 min read

Master HDFS: From Basics to Hands‑On Java API and Shell Operations

Meituan Technology Team

Mar 17, 2017 · Big Data

Optimizing Hadoop NameNode Restart in HA with QJM

By applying a series of JIRA patches and configuration tweaks—such as shrinking the fsLock scope, increasing checkpoint transaction thresholds, off‑loading quota calculations, simplifying BlockReport handling, and async processing of mis‑replicated blocks—the Hadoop HA NameNode restart time in a 540 MB metadata cluster drops from roughly 4000 seconds to about 2000 seconds, cutting total downtime to around 35 minutes and greatly improving cluster availability.

HAHDFSHadoop

0 likes · 18 min read

Optimizing Hadoop NameNode Restart in HA with QJM

Efficient Ops

Feb 9, 2017 · Big Data

Mastering HDFS Disk Balancer: Optimize DataNode Storage in Hadoop 3

This article explains the new HDFS disk balancer feature introduced in Hadoop 3, covering its purpose, supported volume‑selection policies, step‑by‑step usage, planning and execution commands, and how it helps maintain balanced storage across DataNode disks.

Disk BalancerHDFSHadoop

0 likes · 8 min read

Mastering HDFS Disk Balancer: Optimize DataNode Storage in Hadoop 3

Huawei Cloud Developer Alliance

Jan 24, 2017 · Big Data

Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends

This article provides a comprehensive overview of Hadoop as the leading open‑source platform for big‑data processing, detailing its core components HDFS and MapReduce, the evolution to Hadoop 2.0/YARN, and the extensive ecosystem of tools and commercial solutions that enable scalable storage, analysis, and machine‑learning on massive data sets.

Big DataDistributed ComputingHDFS

0 likes · 18 min read

Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends

Art of Distributed System Architecture Design

Dec 31, 2016 · Big Data

Understanding Hadoop: Architecture, HDFS, and MapReduce

This article explains Hadoop as an Apache‑managed open‑source platform for storing massive data on distributed clusters and running robust, efficient analytics via its two core components—HDFS for storage and the Java‑based MapReduce framework for processing—highlighting modularity, high availability, and common tooling.

Distributed ComputingHDFSHadoop

0 likes · 6 min read

Understanding Hadoop: Architecture, HDFS, and MapReduce

Meituan Technology Team

Dec 9, 2016 · Big Data

Memory Usage Analysis of HDFS NameNode Core Data Structures

The article quantitatively breaks down HDFS NameNode memory consumption, showing that the Namespace tree and BlocksMap together dominate heap usage (≈53 GB in large clusters), provides detailed per‑object size estimates for NetworkTopology, INode and block structures, and proposes a simple formula to predict total heap requirements and tuning recommendations.

Big DataHDFSMemory management

0 likes · 13 min read

Memory Usage Analysis of HDFS NameNode Core Data Structures

dbaplus Community

Nov 20, 2016 · Databases

How to Slash HBase Read Latency: Proven Client, Server, and HDFS Tweaks

This article examines the common causes of high read latency in HBase—such as full GC, region‑server imbalance, low write throughput, and inefficient client settings—and provides concrete optimization steps for the client, server, column‑family design, and HDFS layers to dramatically improve performance.

Client TuningDatabaseHBase

0 likes · 16 min read

How to Slash HBase Read Latency: Proven Client, Server, and HDFS Tweaks

ITFLY8 Architecture Home

Nov 18, 2016 · Big Data

Understanding HDFS: Design Goals, Architecture, and Data Replication

This article explains HDFS’s core design principles, including fault tolerance, high‑throughput data access, master‑slave architecture with Namenode and Datanodes, namespace management, block replication strategies, safe mode, metadata persistence, communication protocols, robustness mechanisms, and file operations such as creation, deletion, and space reclamation.

Distributed storageHDFS

0 likes · 16 min read

Understanding HDFS: Design Goals, Architecture, and Data Replication

MaGe Linux Operations

Nov 7, 2016 · Big Data

How HDFS Achieves Low Cost, High Reliability, and Fault Tolerance

This article explains how HDFS, inspired by Google’s GFS, provides a low‑cost, highly reliable, fault‑tolerant, and high‑performance distributed file system for big‑data workloads by using replication, standby NameNodes, block storage, rack awareness, and compute‑close‑to‑data strategies.

Big DataData ReplicationDistributed File System

0 likes · 7 min read

How HDFS Achieves Low Cost, High Reliability, and Fault Tolerance

High Availability Architecture

Oct 20, 2016 · Big Data

Understanding HDFS EditLog Format and Quorum Journal Manager Recovery Process

This article explains the HDFS EditLog file structure, the design of the Quorum Journal Manager for high‑availability, the write‑path optimizations such as batch flushing and double‑buffering, and the detailed Multi‑Paxos based recovery algorithm including isolation, segment selection, prepare and accept phases, and handling journal node failures.

EditLogHDFSPaxos

0 likes · 12 min read

Understanding HDFS EditLog Format and Quorum Journal Manager Recovery Process

Java High-Performance Architecture

Sep 24, 2016 · Big Data

Step-by-Step Guide to Building a Hadoop 2.7.3 Cluster on Three Servers

This tutorial walks you through preparing three Linux servers, configuring password‑less SSH, installing Hadoop 2.7.3, editing core XML files, distributing the installation, starting the services, and verifying HDFS and MapReduce functionality with practical commands and screenshots.

Big DataCluster SetupHDFS

0 likes · 10 min read

Step-by-Step Guide to Building a Hadoop 2.7.3 Cluster on Three Servers

Meituan Technology Team

Aug 26, 2016 · Big Data

Memory Architecture and Analysis of Hadoop HDFS NameNode

The article dissects Hadoop 2.4.1’s HDFS NameNode memory architecture, detailing how the Namespace, BlockManager, NetworkTopology, and LeaseManager consume the heap, exposing scaling problems when metadata reaches hundreds of millions of inodes and blocks, and recommending file merging, block‑size tuning, federation, or external KV stores to mitigate heap pressure.

Big DataHDFSMemory management

0 likes · 17 min read

Memory Architecture and Analysis of Hadoop HDFS NameNode

MaGe Linux Operations

Aug 23, 2016 · Big Data

Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

This article provides a comprehensive, hands‑on tutorial for setting up a Hadoop 2.6.4 cluster on a CentOS 6.5 development server, covering SSH password‑less login, user/group creation, DNS configuration, JDK installation, environment variables, Hadoop installation, HDFS and YARN configuration, and troubleshooting native library warnings.

Big DataCentOSCluster Setup

0 likes · 12 min read

Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

Architecture Digest

Jul 5, 2016 · Big Data

Why Map‑Reduce Is Not the Solution to Your Big Data Problem – A Critical Look at Hadoop

The article reviews Hadoop’s origins from Google’s pioneering papers, explains its architecture and ecosystem, evaluates its strengths such as scalability and benchmarks, discusses current limitations like single‑point failures and complex programming, and outlines upcoming improvements including HDFS Federation and next‑generation MapReduce.

Big DataDistributed ComputingFuture

0 likes · 14 min read

Why Map‑Reduce Is Not the Solution to Your Big Data Problem – A Critical Look at Hadoop

ITPUB

Jun 15, 2016 · Databases

Understanding HBase’s Physical Architecture: Regions, Stores, and WAL

This article explains HBase’s internal architecture, covering the roles of HRegionServer, Client, Zookeeper, Master, RegionServer, the physical storage layout, StoreFile and HFile structures, and the Write-Ahead Log mechanism that ensures data durability and fault tolerance.

HBaseHDFSNoSQL

0 likes · 13 min read

Understanding HBase’s Physical Architecture: Regions, Stores, and WAL

Hulu Beijing

May 31, 2016 · Big Data

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Hadoop 3.0, built on JDK 1.8, adds erasure‑coded HDFS, multi‑NameNode support, native MapReduce task optimizations, cgroup‑based YARN memory and disk isolation, and container resizing, with an alpha slated for summer and a GA release expected in November or December.

Big DataHDFSHadoop

0 likes · 5 min read

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Qunar Tech Salon

May 13, 2016 · Big Data

Overview and Architecture of Hadoop Distributed File System (HDFS)

This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), detailing its design goals, architecture components such as NameNode, DataNode and SecondaryNameNode, data block handling, replication strategies, communication protocols, and the read, write, and delete processes.

Big DataData ReplicationDistributed File System

0 likes · 18 min read

Architect

Apr 28, 2016 · Big Data

Design and Architecture of Youzan Unified Log Platform

The article describes the design, components, and implementation details of Youzan's unified log platform, covering log ingestion via rsyslog, Logstash, and Flume, centralized processing with Kafka, real‑time analysis using Storm/Spark, and storage in HDFS, Elasticsearch, and Hawk, while also discussing challenges and future improvements.

ElasticsearchHDFSKafka

0 likes · 10 min read

Design and Architecture of Youzan Unified Log Platform

ITPUB

Mar 19, 2016 · Big Data

Inside HDFS: How NameNode and DataNode Manage Big Data Writes and Reads

This article explains the fundamentals of distributed file systems, focusing on Hadoop’s HDFS architecture, the separation of metadata and data via NameNode and DataNode, and detailed step‑by‑step write and read processes, including replication, fault recovery, and block splitting across nodes.

Big DataDataNodeDistributed File System

0 likes · 8 min read

Inside HDFS: How NameNode and DataNode Manage Big Data Writes and Reads

Java High-Performance Architecture

Jan 11, 2016 · Big Data

How HDFS Powers Scalable, Reliable Storage in Big Data Environments

This article explains how HDFS abstracts multiple servers into a single file system, splits files into replicated blocks, manages metadata via NameNode and DataNode, and provides linear capacity scaling and high reliability for big data workloads.

Big DataData ReplicationDistributed File System

0 likes · 5 min read

How HDFS Powers Scalable, Reliable Storage in Big Data Environments

ITPUB

Jan 8, 2016 · Databases

How Facebook Scales MySQL Backups: Strategies, Storage, and Incremental Techniques

This article explains Facebook's large‑scale MySQL backup architecture, covering the Python‑based automation framework, master‑slave deployment, logical mysqldump backups, warm and cold storage locations, source selection heuristics, full and incremental backup pipelines, verification processes, and future RBR‑based improvements.

FacebookHDFSIncremental Backup

0 likes · 15 min read

How Facebook Scales MySQL Backups: Strategies, Storage, and Incremental Techniques

dbaplus Community

Dec 30, 2015 · Databases

How Facebook Scales MySQL Backups: Strategies, Storage, and Validation

This article details Facebook's MySQL backup architecture, covering preparation, logical backup format, storage locations, source selection, full and incremental backup pipelines, verification mechanisms, and future directions such as RBR‑based logical incremental backups.

FacebookHDFSIncremental Backup

0 likes · 14 min read

How Facebook Scales MySQL Backups: Strategies, Storage, and Validation

Art of Distributed System Architecture Design

Apr 24, 2015 · Big Data

Design Principles and Architecture of HDFS (Hadoop Distributed File System)

This article explains HDFS's design goals, master/slave architecture, namespace management, block replication strategies, fault tolerance mechanisms, metadata persistence, communication protocols, robustness features, data organization, access methods, and space reclamation, providing a comprehensive overview of Hadoop's distributed storage system.

DataNodeHDFSNameNode

0 likes · 20 min read

Design Principles and Architecture of HDFS (Hadoop Distributed File System)

MaGe Linux Operations

Apr 7, 2015 · Big Data

How Hadoop’s Tiered Storage Optimizes Data Based on Temperature

This article explains Hadoop’s tiered storage concept, describing how data is classified by temperature—hot, warm, cold, frozen—and automatically moved across disk and archive layers to optimize cost and performance, with examples from Hadoop versions and eBay’s large‑scale deployment.

Big DataData TemperatureHDFS

0 likes · 9 min read

How Hadoop’s Tiered Storage Optimizes Data Based on Temperature

MaGe Linux Operations

Nov 5, 2014 · Big Data

Quickly Get Hadoop 2.0 Up and Running: A Minimal Configuration Guide

This article walks through the essential steps to install and configure Hadoop 2.0 on a two‑node Linux cluster, covering version selection, directory setup, core XML files, YARN settings, service startup, verification commands, and basic troubleshooting tips.

Big DataCluster SetupHDFS

0 likes · 9 min read

Quickly Get Hadoop 2.0 Up and Running: A Minimal Configuration Guide