Big Data 12 min read

How Huolala’s Big Data Team Cut Costs and Boosted Efficiency with an Elastic Architecture

Huolala’s three‑year‑old big data team shares how they tackled cost, operations, and analysis inefficiencies by building a layered, elastic infrastructure, adopting ARM servers, automating workflows, embracing cloud‑native practices, and implementing multi‑engine routing, achieving 20‑30% cost savings and higher performance.

Huolala Tech
Huolala Tech
Huolala Tech
How Huolala’s Big Data Team Cut Costs and Boosted Efficiency with an Elastic Architecture

Background and Challenges

Huolala is a multi‑service online logistics platform founded in 2013, now operating in 11 global markets with over 10.5 million monthly active users and 900 k monthly active drivers. The platform generates more than 20 PB of data stored on over 1 000 machines, requiring a robust big‑data infrastructure to ensure safe and stable operation.

Infrastructure Practice

High Cost Efficiency

The team built a four‑layer architecture: platform services, compute engine, cluster‑management & storage, and resource management. To address low resource utilization and peak‑valley cost waste, they replaced a portion of reserved instances with elastic instances, scaling up during peak hours and reclaiming resources afterward, saving 20‑30% of cluster costs.

They also introduced ARM servers, which cost at least 15% less per core and consume less power than x86. After component adaptation (Tez, Spark, YARN) and performance testing, ARM nodes achieved comparable performance while reducing hardware costs.

High Operations Efficiency

Automation was achieved by establishing asset management for clusters, converting SOPs and scripts into distributed workflow jobs, and prioritizing high‑frequency scenarios for automation. This reduced manual effort and improved SRE productivity.

The team also pursued cloud‑native transformation: replacing YARN with Kubernetes for offline resource pools, developing a Remote Shuffle framework (based on Tencent’s Uniffle) to decouple shuffle data from compute nodes, and migrating Flink from YARN to K8s.

High Analysis Efficiency

A hybrid engine service was built to route SQL queries to the most suitable engine (Tez, Spark, Hive, Presto) based on execution plans, data scan size, and operator distribution, with automatic fallback to Hive if all engines fail. Compatibility work achieved over 84% support for Presto features.

Data quality is ensured by comparing query results across engines before gray‑release, guaranteeing 100% pass rate.

Summary and Reflections

The three goals of high cost efficiency, high operations efficiency, and high analysis efficiency have driven the evolution of Huolala’s big‑data infrastructure. Careful benefit‑driven planning prevents blind adoption of hype technologies; for example, lake‑house solutions have not been introduced yet due to unclear ROI.

Future Outlook

Engine efficiency: migrate offline engine from Hive to Spark.

Extreme elasticity: more precise, timely scaling based on load pressure.

Intelligent operations: implement AIOps for self‑healing and full‑link diagnostics.

Offline mixed deployment: continue exploring comprehensive cloud‑native strategies.

Streaming‑batch integration & lake‑house: investigate business‑driven scenarios.

cloud nativecost optimizationelastic scaling
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.