Didi Tech
Jan 25, 2021 · Big Data
Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi
DiDi migrated over 10,000 Hive SQL tasks to Spark SQL using a lightweight dual‑run pipeline that extracts, rewrites, compares, and switches tasks, fixing syntax and UDF differences while adding features such as small‑file merging and enhanced partition pruning, resulting in Spark handling 85 % of workloads with 40 % faster execution, 21 % lower CPU and 49 % lower memory usage.
BigDataDataMigrationHive
0 likes · 18 min read