How We Cut PBs of Waste and Optimized HDFS with Tiered Storage and Cloud Migration
This article details a three‑part technical sharing that covers cost governance for offline Hadoop clusters, a large‑scale data‑center migration with architecture upgrades, and a tiered storage strategy using EC and COS to reduce storage costs and improve performance in a cloud‑native big‑data environment.
Background
The presentation, originally delivered by Liu Xiaolong of Zhihu's data architecture platform, is divided into three parts: cost governance, tiered storage, and cloud‑based extensions. It aims to address rising storage costs, rapid data growth, and the need for scalable, efficient data management.
1. Data Governance
Early offline clusters lacked unified standards, making cost control difficult. A data‑norm specification was created, prompting months of governance across business lines. Results included clearer, safer HDFS data, removal of several petabytes of obsolete data, and heightened cost awareness among teams.
2. Data‑Center Migration
Accelerated data growth and insufficient rack resources in the old data center forced a large‑scale migration. The architecture was upgraded during the move, driven by two main considerations:
Version upgrade : CDH was outdated; Hadoop 3.x introduced erasure coding (EC) support.
Hardware selection : Chose low‑cost, short‑lead‑time CVM standard large‑storage machines.
The migration involved:
Creating a diff table from fsimage of old and new clusters to identify data gaps (paths not migrated, metadata changes, redundant data).
Addressing performance issues such as DistCp overhead by using the -direct flag and improving HADOOP‑16872.
Managing bandwidth contention on dedicated lines by limiting DistCp mappers and bandwidth, and restricting YARN queue resources.
Handling slow incremental syncs by using local mode for non‑partitioned and incremental data.
These steps yielded measurable performance improvements and clarified migration benefits.
3. Tiered Storage
Post‑migration hardware upgrades provided a cost advantage, prompting a tiered storage design for HDFS data:
Four storage tiers were defined, each with a different storage method.
Hotness analysis leveraged fsimage and Hive Metastore partition tables to build a hive_partition_detail table for partition‑level heat metrics.
Cold data was stored using erasure coding (EC), which reduces space by ~50% compared to triple replication while preserving path, account, and permission semantics. Directories larger than 2 MB were eligible for EC.
An automated EC conversion service was built, following these steps:
Select partition paths and record them.
Schedule DistCp to copy data from source directory A to a temporary EC directory B.
Validate data consistency between A and B.
Delete source A, then rename B to A.
Key considerations for EC included component compatibility, performance differences, and suitable file size thresholds.
For ultra‑cold data (~3% of total), Tencent Cloud COS was used. A COS conversion pipeline mirrored the EC process, adding steps to upload partitioned data via DistCp, verify integrity, and update Hive partition locations. Issues encountered and mitigations:
COS checksum mismatch required adjusting bucket policies to expose SHA values.
DistCp -update caused overwrites; resolved via HADOOP‑17256 and HADOOP‑8143 patches.
Excessive delete operations were disabled, retaining only PUTObject permission.
Small‑file upload to COS was slower; referenced community optimizations (HADOOP‑16829).
Hive queries on COS suffered due to reduced map tasks; tuning mapred.max.split.size improved performance.
Additional challenges such as AK/SK configuration visibility in Hive were solved by removing the hive.conf.hidden.list restriction.
4. Cloud Extensions
To integrate COS more tightly, a custom "username + password" mechanism was developed to automatically switch COS AK/SK based on group accounts, hiding credentials from business logic. Support was added for Hadoop, Hive, Presto, Spark, and Flink.
The COS Select feature adds push‑down capabilities for CSV and Parquet formats, enabling internal compute components to leverage these optimizations.
5. Future Plans
Upcoming work includes:
Accelerating hot data using NVMe‑based cache nodes (NVMe Cache).
Unified management of HDFS and COS data, covering lifecycle, permissions, audit logs, and metadata.
Overall, the series of governance, migration, and tiered storage initiatives demonstrate a systematic approach to reducing storage costs, improving data reliability, and extending big‑data workloads to cloud environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
