Big Data 18 min read

How Huolala Scaled to 40PB: Inside Their Evolving Big Data Storage Architecture

Huolala, founded in 2013, runs a massive cross‑cloud hybrid big‑data storage platform of over 40 PB across 3,000+ machines, evolving through four online‑storage phases, robust HA design, performance‑cost optimizations, AI vector storage, and a cost‑governance system that saved more than half of its storage expenses.

DataFunSummit
DataFunSummit
DataFunSummit
How Huolala Scaled to 40PB: Inside Their Evolving Big Data Storage Architecture

Background

Huolala is an internet logistics marketplace founded in 2013, serving over 1.2 million active drivers and 14 million active users monthly. Its massive business volume requires a strong big‑data storage foundation, currently a cross‑cloud hybrid architecture spanning Alibaba Cloud, Tencent Cloud, overseas clouds, and self‑built data centers, with roughly 3,000 machines, 40 PB+ of storage, and 20 K+ daily tasks.

Big‑Data Storage Architecture Evolution

The storage system follows a Lambda‑style architecture, separating real‑time and offline data flows. Online data is stored in HBase 2.0 across multi‑cloud clusters, while offline data resides in HDFS and object storage, and AI vectors are stored separately.

Online Storage Phases

Seed stage: from 0 to 1 cluster, teams built HBase independently.

Growth stage: compute‑storage separation, stability projects, production‑grade storage.

Mature stage: 99.99% SLA, performance aligned with business.

Refinement stage: high‑performance distributed KV and graph databases.

Current online storage uses HBase 2.0 with about 20 clusters, 200+ nodes, 2 PB+ data, 700+ tables, the largest table approaching 1 PB, serving over 100 business scenarios.

High Availability (HA)

HA is achieved with dual clusters and bidirectional replication (3 replicas for primary, 2 for backup), enabling seamless failover and cross‑region disaster recovery. A custom HASDK built on Apollo allows dynamic read/write switching and table‑level failover.

Performance and Efficiency Challenges

Capacity‑oriented clusters using HDD suffer poor performance, while performance‑oriented NVMe clusters have low utilization. Long‑tail issues stem from HBase’s CP‑friendly design. To address these, Huolala introduced heterogeneous disks (NVMe for hot writes, HDD for cold replicas) and leveraged Hadoop’s WAL for three‑copy writes, improving write throughput and query speed by up to 10× while reducing costs by 30‑40%.

Long‑Tail Optimizations

HBase tuning (GC parameters, custom compaction) was insufficient, so the team evaluated OceanBase and Lindorm. Lindorm (Alibaba’s HBase‑compatible service) offers 2‑7× performance gains and better compression, while OceanBase provides multi‑cloud support and superior disaster‑recovery capabilities.

Cost Governance

A cloud data‑management platform implements a three‑layer cost‑governance model: storage layer with hot/cold tiering, capability layer collecting hotness metrics via Kafka and ES, and management layer handling data lifecycle, archiving, and optimization. These measures saved roughly 54% of storage costs.

AI Vector Storage

With AI models proliferating, Huolala built a vector storage pipeline: documents and chat logs are tokenized, embedded, and stored in a vector database; queries first use Elasticsearch for keyword matching, then vector similarity for re‑ranking, finally a large language model generates answers. This powers intelligent customer service, IT ops bots, and knowledge‑base retrieval across dozens of business units.

Q&A Highlights

Key questions covered AI‑training data isolation (minimal extra cost), bandwidth reduction after data split (≈10% drop), IDC failure handling (cloud‑first strategy with quick recovery), and detailed cost‑saving methods (dedicated team, hot/cold tiering, lifecycle management).

Overall, Huolala’s storage strategy focuses on stability, cost efficiency, and high performance, evolving to support AI workloads while continuously optimizing resources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datahigh availabilityCost Optimizationstorage architectureAI vector storagecloud hybrid
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.