Big Data 24 min read

How Qunar Built a Scalable BI Platform for Real‑Time Analytics and Self‑Service Reporting

This article details Qunar's multi‑year journey of designing and evolving a full‑stack BI platform—covering data ingestion, storage, query engines, self‑service analytics, and real‑time OLAP—by iterating through three development phases, selecting technologies such as Impala, Kudu, ClickHouse and Apache Druid, and addressing performance, usability and governance challenges to empower business users with fast, reliable data insights.

dbaplus Community

Jun 14, 2022

How Qunar Built a Scalable BI Platform for Real‑Time Analytics and Self‑Service Reporting

Background

Rapid growth of Qunar’s业务 required a BI platform that supports drag‑and‑drop reporting, ad‑hoc analysis, sub‑second query response, and trustworthy metrics.

Evolution Stages

Original stage (pre‑2016) – a monolithic end‑to‑end reporting system built by data developers.

Development stage (2016‑2018) – configurable reporting (V2), self‑service analysis, and an OLAP layer.

Systematic stage (2019‑present) – on‑the‑fly queries, self‑service email reports, third‑generation data‑report module (V3), and comprehensive governance.

Stage 1: Original

Data was extracted from logs using Hive, transformed via ETL, and loaded into MySQL. Backend services queried MySQL directly and custom front‑end pages rendered charts. This architecture suffered from low efficiency, inconsistent code quality, duplicated effort, and poor scalability.

Stage 2: Development (2016‑2018)

Key improvements:

Data developers exported ADS‑layer tables to PostgreSQL to leverage its rich analytical functions.

Self‑service analysis allowed product users to configure dimensions, metrics, and filters without writing SQL.

Real‑time pipelines used Kafka + Flink to write hot data to Kudu and cold data to HDFS (Parquet). Impala provided a unified query layer over both stores.

To support both offline and real‑time queries, a hybrid storage architecture was adopted: Impala+Kudu for hot data and Impala+Parquet for offline data.

Stage 3: Systematic (2019‑present)

Major components introduced:

On‑the‑fly query & email report module : Users submit SQL, which is syntax‑checked, permission‑validated, and executed via JDBC. Results can be previewed, downloaded, or emailed.

Data‑report module (V3) : Componentized chart library, low‑code drag‑and‑drop configuration stored as JSON, and a unified permission model per business unit (BU).

Real‑time OLAP : Supports hundreds of dimensions and metrics on billions of rows with sub‑second latency. After evaluating Druid, Kylin, Presto, Elasticsearch, and Impala, ClickHouse was selected for its high‑throughput query performance.

Data ingestion uses Waterdrop to load offline Hive data into ClickHouse and a ClickHouse Kafka Engine for real‑time streams. Query flow: user request → SQL parsing → ClickHouse execution → front‑end visualization.

Architecture Overview

Data source layer : MySQL, offline warehouses, metric system, real‑time Kafka streams.

Data ingestion layer : Waterdrop and custom pipelines import data into PostgreSQL, ClickHouse, or Druid.

Storage/engine layer : PostgreSQL/GP for moderate data, ClickHouse for high‑volume real‑time analytics, Druid for pre‑aggregated queries.

Data model layer : Defines dimensions and metrics based on business requirements.

Presentation layer : Visual charts, dashboards, and self‑service drag‑and‑drop configuration.

System management : Unified permission system, task scheduling, performance monitoring, and usage tracking.

Key Features

Multi‑metric calculations (e.g., deriving per‑user page views).

Integrated monitoring & alerting with QTalk and WeChat.

Data lineage (血缘) visibility for each chart and metric.

Performance Benchmark

Benchmark tests on identical datasets showed ClickHouse delivering the lowest query latency, comfortably meeting the 3‑second response requirement for OLAP queries involving hundreds of dimensions and billions of rows.

Future Plans

Mobile BI clients integrated with the company’s IM tool for subscription, interactive analysis, and alerting.

Further abstraction of platform layers to reduce maintenance overhead.

Expansion of analytical scenarios such as retention, attribution, distribution, and user‑path analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data real-time analytics Data Platform ClickHouse Apache Druid BI Self-service Reporting

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.