Radish, Keep Going!
Jan 30, 2026 · Big Data
How Uber Scaled Data Replication to Petabytes Daily with Distcp Optimizations
Uber tackled the challenge of replicating over 350 PB of data across on‑premise and cloud lakes by redesigning Hadoop Distcp, moving intensive tasks to the Application Master, parallelising copy‑listing and commit phases, and leveraging Uber‑mapper jobs to dramatically cut latency and improve resource efficiency.
Big DataDistcpHadoop
0 likes · 17 min read
