Big Data 14 min read

How ClickHouse Powers Real-Time Self-Service Analytics at Scale

This article examines why ClickHouse was chosen as the OLAP engine for a massive self‑service analytics platform, describes the system architecture, shares concrete memory and performance tuning parameters, and outlines current challenges and future roadmap for large‑scale real‑time data analysis.

ITPUB
ITPUB
ITPUB
How ClickHouse Powers Real-Time Self-Service Analytics at Scale

1. OLAP Technology Selection for Self‑Service Analytics

Background

The platform generates massive event data (exposures, clicks, displays) that require precise deduplication, retention, funnel calculations and other complex models. Offline pre‑aggregation cannot keep up with the need for ad‑hoc, multi‑dimensional reports, leading to long development cycles and stale data.

Selection Criteria

Three dimensions were evaluated: performance, flexibility, and complexity.

Performance – data volume (hundreds of millions to billions) and query latency (ms‑/s‑level).

Flexibility – support for both aggregated and raw queries, ingestion of offline and real‑time data, and high‑concurrency ad‑hoc queries.

Complexity – simple architecture, low operational overhead, easy onboarding, and strong extensibility.

Open‑source OLAP candidates were compared:

MOLAP (Kylin, Druid) – pre‑computed results give sub‑second latency but lack flexibility.

MPP‑based ROLAP (Impala, Presto) – real‑time ingestion and flexible queries, but performance depends heavily on resources.

ClickHouse – columnar storage, vectorized execution, millisecond‑level response, but limited distributed join and no full transaction support.

Doris – easy ops and transactional support, yet missing built‑in funnel/retention functions.

Considering the criteria, ClickHouse was selected as the core analysis engine.

2. The Gauss Self‑Service Analytics Platform

System Overview

The platform consists of two main functions: point‑event metadata management and self‑service analytics that provide multi‑dimensional, multi‑metric cross‑analysis, user‑profile tagging, audience selection, and AB‑test capabilities.

Architecture

Four layers are defined:

Data Ingestion Layer – business data from MySQL and log events are captured via Flink‑CDC to Kafka and Flume, then streamed to real‑time (Kafka) and offline (HDFS) paths.

Data Storage Layer – Kafka and HDFS store raw streams; ClickHouse stores cleaned wide tables for fast querying.

Data Service Layer – unified HTTP service exposing SQL‑based client tools.

Data Application Layer – the Gauss analytics UI and a user‑profile service built on top of ClickHouse.

ClickHouse Use Cases in the Platform

Behavior analysis – materialized views and aggregate tables provide real‑time metrics for activity pages.

AB‑test analysis – Flink streams Kafka events to ClickHouse, enabling real‑time experiment result calculation.

3. ClickHouse Optimization Practices

3.1 Memory Optimization

Memory‑intensive queries (e.g., COUNT/DISTINCT, GROUP BY, ORDER BY) can exceed server limits. Mitigations include using approximate functions and tuning max_bytes_before_external_group_by and max_bytes_before_external_sort.

3.2 Performance‑Tuning Parameters

max_concurrent_queries

– raised from 100 to 150 to match cluster capacity. max_memory_usage – set to ~80% of total RAM per query. max_memory_usage_for_all_queries – set to 80‑90% of total RAM for the node. max_memory_usage_for_user & max_bytes_before_external_sort – trigger disk‑based spill for large GROUP BY/ORDER BY. background_pool_size – increased from 16 to 32 to accelerate background merges.

3.3 Billion‑Row JOIN

Data is pre‑partitioned by join key, ensuring that matching rows reside on the same node. Local joins are performed first, followed by a final aggregation, eliminating costly cross‑node distributed joins.

4. Future Plans and Outlook for ClickHouse

4.1 Current Pain Points

Limited high‑concurrency capability (officially ~100 QPS).

No transactional DDL or distributed transaction support; heavy reliance on ZooKeeper.

Lack of row‑level UPDATE/DELETE for real‑time use cases.

No automatic re‑balance; manual data re‑distribution required during scaling.

4.2 Roadmap

Platform‑level services and standardized fault handling, including multi‑tenant isolation, rate‑limiting, and fine‑grained monitoring.

Containerized deployment for compute‑storage separation and elastic scaling with automatic data re‑balancing.

Intelligent service architecture – dynamically selecting ClickHouse or Doris based on workload characteristics.

Kernel‑level enhancements: real‑time write consistency, distributed transaction support, and removal of ZooKeeper dependency.

5. Summary

The article presents the OLAP ecosystem landscape, details the Gauss platform’s architecture, shares concrete ClickHouse tuning techniques for massive data workloads, and discusses existing limitations and future directions needed to achieve cloud‑native elasticity and full feature parity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Dataperformance tuningClickHouseOLAPData ArchitectureSelf-Service Analytics
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.