Big Data 19 min read

From Druid to Apache Doris: Huolala’s OLAP Evolution and Performance Insights

Huolala’s data‑engineer Yang Qiuji shares how the company’s OLAP platform progressed from Druid (OLAP 1.0) to ClickHouse (OLAP 2.0) and finally to Apache Doris (OLAP 3.0), detailing business drivers, technical evaluations, POC results, stability measures, and future roadmap.

Huolala Tech
Huolala Tech
Huolala Tech
From Druid to Apache Doris: Huolala’s OLAP Evolution and Performance Insights

Business Background

Huolala, founded in 2013, serves over 3.5 million drivers and 7.6 million users across 352 Chinese cities. Its big‑data system runs on three IDC clusters with more than 1,000 machines, 20 PB of storage and over 20 k daily tasks.

Big‑Data Architecture

The platform is organized into five layers: foundation, access, platform, service and application. The foundation and access layers provide storage, compute and cluster management; the platform layer hosts data‑research and governance platforms; the service and application layers expose business‑oriented data.

Figure 1.1 Huolala big‑data system architecture

Data Processing Flow

Data flows through four stages: real‑time and offline collection, storage & computation (including ETL and Flink processing), and data services (OLAP queries). Real‑time data is ingested via Flink and stored in HBase/OLAP tables; offline data is batch‑loaded from business databases.

Figure 1.2 Data processing pipeline

OLAP 1.0 – Druid (2021‑H1)

Used to support the “Compass” product with fast single‑table aggregation. Limitations included storage bottlenecks, high development cost for dimension changes, and inability to handle some aggregations.

Storage bottleneck (dimension explosion)

High development effort for new dimensions

Missing support for certain aggregations

POC focused on functional, performance and accuracy verification. Issues discovered: unordered real‑time data, unstable StringLast function, and lack of efficient distinct counting.

OLAP 2.0 – ClickHouse (2021‑H2)

Adopted to support the “Smart‑Location” tool with better compression and detailed queries. ClickHouse was chosen over Druid because it handled complex data types and did not require per‑table detail storage.

OLAP 3.0 – Apache Doris (2022‑present)

Business needs grew to multi‑source joins for AB experiments and real‑time warehousing. Doris was selected after evaluating Druid, ClickHouse, Kylin, Presto and others, thanks to its strong SQL support, shuffle‑based joins, and Java‑native implementation.

Supports large‑table joins via shuffle

Rich SQL features and compatibility with existing MySQL‑style queries

Better integration with Huolala’s Java stack

POC used both real business data and the TPC‑DS benchmark; a one‑day 500 M‑row join completed in ~9 seconds (TP75). Data quality was validated against Hive.

Stability Guarantees

Three‑phase approach: pre‑deployment capacity testing, runtime monitoring (including Compaction metrics), and post‑deployment health checks. Issues such as high query latency were solved by reordering joins (large‑table JOIN small‑table) and enabling runtime Bloom filters. Unhealthy tablets were fixed by applying Doris 1.1.0 patches and tuning compaction parameters.

Future Plans

Move toward an OLAP platform that offers self‑service modeling and multi‑engine routing, gradually migrating Druid workloads to Doris while keeping ClickHouse for niche cases. Planned engine evolution focuses on performance, stability, and core‑kernel enhancements.

Q&A Highlights

Migration cost is comparable to previous rollouts; SQL cache and partition cache provide high hit rates for offline queries; Doris’s near‑real‑time ingestion latency is ~10 seconds per micro‑batch; the current Doris cluster runs on about a dozen nodes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData WarehousingOLAPApache Doris
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.