Big Data 12 min read

How Huolala’s Big Data Team Cut Costs and Boosted Efficiency with an Elastic Architecture

Huolala’s three‑year‑old big data team shares how they tackled cost, operations, and analysis inefficiencies by building a layered, elastic infrastructure, adopting ARM servers, automating workflows, embracing cloud‑native practices, and implementing multi‑engine routing, achieving 20‑30% cost savings and higher performance.

Huolala Tech

Aug 1, 2024

How Huolala’s Big Data Team Cut Costs and Boosted Efficiency with an Elastic Architecture

Background and Challenges

Huolala is a multi‑service online logistics platform founded in 2013, now operating in 11 global markets with over 10.5 million monthly active users and 900 k monthly active drivers. The platform generates more than 20 PB of data stored on over 1 000 machines, requiring a robust big‑data infrastructure to ensure safe and stable operation.

Infrastructure Practice

High Cost Efficiency

The team built a four‑layer architecture: platform services, compute engine, cluster‑management & storage, and resource management. To address low resource utilization and peak‑valley cost waste, they replaced a portion of reserved instances with elastic instances, scaling up during peak hours and reclaiming resources afterward, saving 20‑30% of cluster costs.

They also introduced ARM servers, which cost at least 15% less per core and consume less power than x86. After component adaptation (Tez, Spark, YARN) and performance testing, ARM nodes achieved comparable performance while reducing hardware costs.

High Operations Efficiency

Automation was achieved by establishing asset management for clusters, converting SOPs and scripts into distributed workflow jobs, and prioritizing high‑frequency scenarios for automation. This reduced manual effort and improved SRE productivity.

The team also pursued cloud‑native transformation: replacing YARN with Kubernetes for offline resource pools, developing a Remote Shuffle framework (based on Tencent’s Uniffle) to decouple shuffle data from compute nodes, and migrating Flink from YARN to K8s.

High Analysis Efficiency

A hybrid engine service was built to route SQL queries to the most suitable engine (Tez, Spark, Hive, Presto) based on execution plans, data scan size, and operator distribution, with automatic fallback to Hive if all engines fail. Compatibility work achieved over 84% support for Presto features.

Data quality is ensured by comparing query results across engines before gray‑release, guaranteeing 100% pass rate.

Summary and Reflections

The three goals of high cost efficiency, high operations efficiency, and high analysis efficiency have driven the evolution of Huolala’s big‑data infrastructure. Careful benefit‑driven planning prevents blind adoption of hype technologies; for example, lake‑house solutions have not been introduced yet due to unclear ROI.

Future Outlook

Engine efficiency: migrate offline engine from Hive to Spark.

Extreme elasticity: more precise, timely scaling based on load pressure.

Intelligent operations: implement AIOps for self‑healing and full‑link diagnostics.

Offline mixed deployment: continue exploring comprehensive cloud‑native strategies.

Streaming‑batch integration & lake‑house: investigate business‑driven scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native cost optimization Elastic Scaling

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.