Big Data 17 min read

Evolution and Construction of Huolala's OLAP System Based on Doris

This presentation details Huolala's journey from its initial OLAP architecture to a multi‑engine platform, describing background, data‑flow layers, technical research, engine selection (Druid, ClickHouse, Doris), POC validation, performance tuning, stability measures, production rollout, problem analysis, and future roadmap.

DataFunTalk
DataFunTalk
DataFunTalk
Evolution and Construction of Huolala's OLAP System Based on Doris

Background Introduction

Huolala, founded in 2013, operates a large‑scale logistics platform covering 352 Chinese cities with over 58 000 active drivers and 760 000 active users. Its data platform runs on three IDC sites, hosts thousands of machines, stores more than 20 PB of data, and processes over 20 k daily tasks.

Big Data Architecture Overview

The system is organized into five layers: a foundational and ingestion layer for storage, computation, and cluster management; a platform layer (data‑development and governance platforms); a data‑warehouse layer; and service & application layers that provide business‑oriented analytics.

Data Flow

Data is collected via real‑time (event‑level) and batch pipelines, stored, computed, and served through both real‑time and offline services. Real‑time ingestion uses Flink to write to HBase/OLAP stores, while batch ingestion pulls from business databases on hourly/daily schedules.

OLAP Evolution – Phase 1.0 (Incubation)

Initially, analytics were performed on MySQL after Flink aggregation, leading to storage bottlenecks, high development cost, and limited dimensional analysis. To address these issues, the team evaluated Druid, ClickHouse, Kylin, Presto, and chose Druid for its Java‑centric stack and scalability.

Technical Research & POC

Function verification: rewrite business SQL to Druid’s Rollup semantics, handling UDFs and COUNT DISTINCT.

Performance verification: benchmark queries (P75/P90/P99) with cache disabled, using flame graphs for bottleneck analysis.

Data accuracy verification: compare results between Hive and Druid on both business and TPC‑DS datasets.

Stability Assurance

Stability is ensured through pre‑deployment checks (capacity planning, disaster‑recovery drills), real‑time monitoring of the full stack, and post‑incident reviews with corrective actions.

Production Rollout

The rollout is split into three stages: OLAP testing (Druid receives data, queries still go to MySQL), observation (queries gradually switch to Druid while MySQL remains as fallback), and final migration (MySQL retired).

Problem Summary & Solutions

Real‑time data disorder causing many small files – mitigated by upstream Flink filtering.

Unstable StringLast function – introduced StringLastMax/Min.

Lack of efficient distinct‑count – integrated community patch for bitmap support.

Phase 2.0 (Improvement)

To support driver‑detail queries, multi‑dimensional aggregation, and high‑throughput real‑time writes (up to 1 billion rows per day), the team revisited the technical stack, adding ClickHouse for complex data types while retaining Druid for dimensional analysis.

Phase 3.0 (Future)

Future plans focus on OLAP platformization, self‑service modeling, multi‑engine routing, and migration toward Doris as the primary engine with ClickHouse as a supplement, aiming for better scalability and operational simplicity.

Q&A Highlights

Migration cost from Druid to Doris is mainly SQL rewrite effort.

Doris offers simpler deployment (single FE/BE) and better horizontal scaling compared to Druid, ClickHouse, Kylin, and Presto.

Real‑time query latency targets are ≤5 seconds; sub‑200 ms targets require further testing.

Overall, the talk demonstrates a systematic approach to evolving a large‑scale OLAP platform, balancing engine capabilities, performance, stability, and future extensibility.

big dataClickHousedata warehouseOLAPDruidDorisHuolala
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.