Big Data 14 min read

How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs

This article details HuoLala's exploration of replacing x86 compute nodes with ARM servers in its big‑data platform, covering performance benchmarks, component adaptations for YARN, Tez/MR, security tools, a critical JDK de‑optimization issue, and the resulting production outcomes and future roadmap.

Huolala Tech
Huolala Tech
Huolala Tech
How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs

Background

To meet cost‑reduction goals, HuoLala evaluated replacing x86 compute nodes with ARM servers, which promise about 15% lower hardware cost while offering comparable performance and growing cloud support.

ARM Overview

ARM (Advanced RISC Machine) is a RISC‑based architecture whose instruction set differs from x86, requiring recompilation of native binaries.

ARM Machine Performance

Sysbench tests were run on both x86 and ARM machines under identical conditions. The results show that ARM CPUs handle roughly three times more events than x86, indicating no performance disadvantage.

Machine  Threads  cpu‑max‑prime  Time  CPU Usage  Events (avg/stddev)  Execution Time (avg/stddev)

x86      32       10000          600s  100%      619477.5625/2692.78  599.6787/0.01
x86      32       40000          600s  100%      90961.1875/410.94    599.9231/0.01
x86      32       80000          600s  100%      36186.7812/111.11    599.9618/0.02
ARM      32       10000          600s  100%      1812940.4062/1608.03 599.4050/0.01
ARM      32       40000          600s  100%      290711.4375/246.13   599.8993/0.00
ARM      32       80000          600s  100%      114125.4062/98.32    599.9588/0.00
x86      16       10000          600s  50%       643620.7500/410.99   599.7101/0.00
x86      16       40000          600s  50%       96651.7500/73.83     599.9378/0.00
x86      16       80000          600s  50%       37213.6250/22.42     599.9005/0.00
ARM      16       10000          600s  50%       1816920.3750/624.19  599.4135/0.01
ARM      16       40000          600s  50%       291193.7500/181.15   599.9005/0.00
ARM      16       80000          600s  50%       114328.8750/45.24    599.9605/0.00

Practice

The migration began with offline workloads, requiring adaptations in YARN, Tez/MR engines, and security/operations components.

YARN Adaptation

YARN relies on native libraries packaged in JAR files. The following components were recompiled for ARM or replaced with ARM‑compatible versions:

Component            Dependency JAR/native          Purpose                                   Solution
YARN NodeManager    leveldbjni-all-1.8.jar          Stores task state in LevelDB               Recompile .so for ARM or use ARM‑compatible JAR
                    spark-2.3.2.3.1.4.0-315-yarn-shuffle.jar  Provides Spark shuffle service          Recompile .so for ARM or use ARM‑compatible JAR
                    /usr/hdp/3.1.4.0-315/hadoop/lib/native  Hadoop native libraries               Build with Snappy support via mvn package -DskipTests -Pdist -Dnative -Dsnappy.lib=/usr/local/lib64 -Dbundle.snappy

Tez/MR Engine Adaptation

Tez and MapReduce engines also depend on native .so libraries inside JARs. The main issues and solutions are:

Jar                     Problem                                            Solution
snappy-java-1.0.5.jar   (Possible cause: can't load AMD 64‑bit .so on AARCH64)  Recompile .so for ARM or obtain ARM‑compatible JAR from community repo
lz4-java-1.4.0.jar     Same issue                                         Same solution
crffpp-java-1.0.2.jar  No ARM support                                     Download source, cross‑compile .so for ARM, then repackage

Java JAR with .so Adaptation

When a Java program uses JNI, the native .so must match the platform. Libraries like snappy‑java already provide ARM builds; others, such as crffpp‑java, require source compilation.

ARM .so library compatibility
ARM .so library compatibility

Production Issues

During a gray‑release of OpenJDK 1.8.0‑252 on ARM, a SQL job took ten times longer than on x86 because of extensive de‑optimization in the JVM.

Investigation

JStack showed tasks stuck in window functions; flame graphs revealed most time spent in de‑optimization. The root cause was identified as a known OpenJDK 8 bug (JDK‑8227523) that was fixed in JDK 11.

Solution

Rather than upgrading the whole JDK, HuoLala adopted a vendor‑provided ARM‑optimized JDK (bisheng‑jdk1.8.0_362). Benchmarks showed execution times comparable to x86.

JDK Version          Sample 1   Sample 2   Sample 3   Sample 4   Sample 5   Sample 6   Sample 7   Sample 8   Sample 9   Sample 10   Average (s)
java-1.8.0-openjdk-1.8.0.252   1445.354 1450.944 1435.423 1449.719 1455.509 1451.354 1453.670 1451.442 1457.354 1455.354 1455.354
bisheng-jdk1.8.0_352          293.282 286.442 291.414 291.559 286.617 296.944 295.816 290.509 300.975 292.509 292.606
bisheng-jdk1.8.0_362          298.670 295.357 295.018 294.666 296.995 291.423 305.044 305.886 291.616 303.755 297.843

Results

After replacing 50% of the offline YARN cluster with ARM nodes, the system has run stably for two months with no major incidents aside from the JVM bug, which was resolved by the custom JDK.

CPU usage comparison
CPU usage comparison

Future Plan

Phase 1 aims to replace 100% of offline YARN compute nodes with ARM servers. Phase 2 will extend ARM adaptation to HiveServer2, Presto, and Doris services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataJDKARMYARN
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.