How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs
This article details HuoLala's exploration of replacing x86 compute nodes with ARM servers in its big‑data platform, covering performance benchmarks, component adaptations for YARN, Tez/MR, security tools, a critical JDK de‑optimization issue, and the resulting production outcomes and future roadmap.
Background
To meet cost‑reduction goals, HuoLala evaluated replacing x86 compute nodes with ARM servers, which promise about 15% lower hardware cost while offering comparable performance and growing cloud support.
ARM Overview
ARM (Advanced RISC Machine) is a RISC‑based architecture whose instruction set differs from x86, requiring recompilation of native binaries.
ARM Machine Performance
Sysbench tests were run on both x86 and ARM machines under identical conditions. The results show that ARM CPUs handle roughly three times more events than x86, indicating no performance disadvantage.
Machine Threads cpu‑max‑prime Time CPU Usage Events (avg/stddev) Execution Time (avg/stddev)
x86 32 10000 600s 100% 619477.5625/2692.78 599.6787/0.01
x86 32 40000 600s 100% 90961.1875/410.94 599.9231/0.01
x86 32 80000 600s 100% 36186.7812/111.11 599.9618/0.02
ARM 32 10000 600s 100% 1812940.4062/1608.03 599.4050/0.01
ARM 32 40000 600s 100% 290711.4375/246.13 599.8993/0.00
ARM 32 80000 600s 100% 114125.4062/98.32 599.9588/0.00
x86 16 10000 600s 50% 643620.7500/410.99 599.7101/0.00
x86 16 40000 600s 50% 96651.7500/73.83 599.9378/0.00
x86 16 80000 600s 50% 37213.6250/22.42 599.9005/0.00
ARM 16 10000 600s 50% 1816920.3750/624.19 599.4135/0.01
ARM 16 40000 600s 50% 291193.7500/181.15 599.9005/0.00
ARM 16 80000 600s 50% 114328.8750/45.24 599.9605/0.00Practice
The migration began with offline workloads, requiring adaptations in YARN, Tez/MR engines, and security/operations components.
YARN Adaptation
YARN relies on native libraries packaged in JAR files. The following components were recompiled for ARM or replaced with ARM‑compatible versions:
Component Dependency JAR/native Purpose Solution
YARN NodeManager leveldbjni-all-1.8.jar Stores task state in LevelDB Recompile .so for ARM or use ARM‑compatible JAR
spark-2.3.2.3.1.4.0-315-yarn-shuffle.jar Provides Spark shuffle service Recompile .so for ARM or use ARM‑compatible JAR
/usr/hdp/3.1.4.0-315/hadoop/lib/native Hadoop native libraries Build with Snappy support via mvn package -DskipTests -Pdist -Dnative -Dsnappy.lib=/usr/local/lib64 -Dbundle.snappyTez/MR Engine Adaptation
Tez and MapReduce engines also depend on native .so libraries inside JARs. The main issues and solutions are:
Jar Problem Solution
snappy-java-1.0.5.jar (Possible cause: can't load AMD 64‑bit .so on AARCH64) Recompile .so for ARM or obtain ARM‑compatible JAR from community repo
lz4-java-1.4.0.jar Same issue Same solution
crffpp-java-1.0.2.jar No ARM support Download source, cross‑compile .so for ARM, then repackageJava JAR with .so Adaptation
When a Java program uses JNI, the native .so must match the platform. Libraries like snappy‑java already provide ARM builds; others, such as crffpp‑java, require source compilation.
Production Issues
During a gray‑release of OpenJDK 1.8.0‑252 on ARM, a SQL job took ten times longer than on x86 because of extensive de‑optimization in the JVM.
Investigation
JStack showed tasks stuck in window functions; flame graphs revealed most time spent in de‑optimization. The root cause was identified as a known OpenJDK 8 bug (JDK‑8227523) that was fixed in JDK 11.
Solution
Rather than upgrading the whole JDK, HuoLala adopted a vendor‑provided ARM‑optimized JDK (bisheng‑jdk1.8.0_362). Benchmarks showed execution times comparable to x86.
JDK Version Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 Sample 10 Average (s)
java-1.8.0-openjdk-1.8.0.252 1445.354 1450.944 1435.423 1449.719 1455.509 1451.354 1453.670 1451.442 1457.354 1455.354 1455.354
bisheng-jdk1.8.0_352 293.282 286.442 291.414 291.559 286.617 296.944 295.816 290.509 300.975 292.509 292.606
bisheng-jdk1.8.0_362 298.670 295.357 295.018 294.666 296.995 291.423 305.044 305.886 291.616 303.755 297.843Results
After replacing 50% of the offline YARN cluster with ARM nodes, the system has run stably for two months with no major incidents aside from the JVM bug, which was resolved by the custom JDK.
Future Plan
Phase 1 aims to replace 100% of offline YARN compute nodes with ARM servers. Phase 2 will extend ARM adaptation to HiveServer2, Presto, and Doris services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
