Tagged articles
8 articles
Page 1 of 1
Radish, Keep Going!
Radish, Keep Going!
Jan 30, 2026 · Big Data

How Uber Scaled Data Replication to Petabytes Daily with Distcp Optimizations

Uber tackled the challenge of replicating over 350 PB of data across on‑premise and cloud lakes by redesigning Hadoop Distcp, moving intensive tasks to the Application Master, parallelising copy‑listing and commit phases, and leveraging Uber‑mapper jobs to dramatically cut latency and improve resource efficiency.

Big DataDistcpHadoop
0 likes · 17 min read
How Uber Scaled Data Replication to Petabytes Daily with Distcp Optimizations
Huolala Tech
Huolala Tech
May 25, 2023 · Big Data

How Huolala Solved HBase Bulkload Challenges: A Practical Guide

This article details Huolala’s experience building a unified Hive‑to‑HBase pipeline, addressing low development efficiency, lack of monitoring, and HBase instability by evaluating two architectures, implementing a generic Transform tool, optimizing compaction and DistCp, and establishing stability and data‑validation mechanisms.

DistcpHBasebulkload
0 likes · 12 min read
How Huolala Solved HBase Bulkload Challenges: A Practical Guide
dbaplus Community
dbaplus Community
Dec 15, 2021 · Big Data

How We Migrated Hundreds of Petabytes of Hadoop Data Without Downtime

This article details the background, challenges, and step‑by‑step solutions for migrating over a hundred petabytes of Hadoop HDFS data across data centers within a month, covering strategy selection, code modifications, balance optimization, and tool enhancements.

Balance OptimizationBig Data OperationsData Migration
0 likes · 14 min read
How We Migrated Hundreds of Petabytes of Hadoop Data Without Downtime
dbaplus Community
dbaplus Community
Dec 22, 2020 · Big Data

How eBay Migrated 10 PB of HDFS Data Across Namespaces in Just 2 Hours

This article details how eBay's ADI Hadoop team tackled a massive 10 PB, 10‑million‑file migration by optimizing DistCp with Fastcopy, load‑balancing, ACL handling, and failure recovery, ultimately completing the transfer within a two‑hour window while preserving cluster stability and performance.

Big DataDistcpHDFS
0 likes · 16 min read
How eBay Migrated 10 PB of HDFS Data Across Namespaces in Just 2 Hours
Meituan Technology Team
Meituan Technology Team
Aug 25, 2017 · Big Data

Data Platform Integration and Multi‑Data‑Center Architecture at Meituan‑Dianping

After Meituan merged with Dianping, engineers unified two massive Hadoop ecosystems across Beijing and Shanghai by breaking the project into four phases—unify, copy, switch, fuse—standardizing versions, implementing zone‑aware transfers, cross‑realm Kerberos, and federated metadata to achieve a single, reliable multi‑data‑center platform.

Big DataCluster FusionData Platform
0 likes · 32 min read
Data Platform Integration and Multi‑Data‑Center Architecture at Meituan‑Dianping