Big Data 11 min read

Building a Real-Time OLAP Analytics Platform for QQ Music with ClickHouse and Tencent Cloud EMR

QQ Music’s data team tackled massive PB‑scale, real‑time analytics challenges by migrating from Hive to a ClickHouse‑based OLAP platform integrated with Tencent Cloud EMR and Superset, achieving low‑latency, high‑availability data processing, self‑service visualization, and efficient read/write scaling for billions of daily events.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Building a Real-Time OLAP Analytics Platform for QQ Music with ClickHouse and Tencent Cloud EMR

QQ Music, a leading music streaming service with over 800 million registered users and a catalog of more than 30 million songs, generates petabyte‑scale daily data that requires fast, flexible analysis.

Traditional offline warehouses built on Hive suffered from low timeliness, poor usability, and lengthy processing cycles, making it difficult to support real‑time dashboards, instant queries, and rapid decision‑making.

To address these issues, the QQ Music big‑data team partnered with Tencent Cloud EMR to design a high‑availability, low‑latency OLAP platform based on ClickHouse and Superset, leveraging EMR’s elastic Hadoop services.

ClickHouse Overview – an open‑source, column‑oriented OLAP database from Yandex that delivers millisecond‑level query performance on PB‑scale data.

Key technical challenges and solutions:

SSD‑based ZooKeeper improves metadata I/O, reducing replica synchronization delays.

Global load‑balancing ensures idempotent writes, preserving data consistency after retries.

Tube message queue enables efficient ingestion of both real‑time streams and offline batch results.

Partition strategy limits table partitions to under 10,000, converting hourly partitions to daily to avoid file‑descriptor exhaustion.

Read/write separation using temporary ClickHouse nodes on Kubernetes offloads heavy write workloads, allowing the main cluster to serve fast reads.

Local‑hash based cross‑table joins avoid costly Global IN/Join operations, improving query speed.

Superset, an Apache‑backed BI tool, was integrated for self‑service data visualization, allowing non‑technical staff to create thousands of dashboards and achieve “全民数据分析”.

The combined ClickHouse + Superset solution on Tencent Cloud EMR provides rapid cluster provisioning, elastic scaling, automated operations, and 24/7 professional support, delivering a production‑ready, cloud‑native big‑data analytics stack.

Overall, the cloud‑based OLAP infrastructure has enabled QQ Music to perform real‑time analysis on PB‑level data with sub‑second latency, supporting a wide range of business scenarios while reducing operational complexity and cost.

Real-time analyticsClickHouseOLAPData VisualizationSupersetCloud EMR
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.