Big Data 11 min read

How ByteHouse Scales Real‑Time Analytics on ClickHouse: Challenges & Solutions

This article details ByteHouse’s evolution from ClickHouse, presenting two real‑time analytics use cases, the technical selection process, performance bottlenecks such as write throughput and Kafka consumption, and the engineered solutions—including asynchronous indexing, multi‑threaded Kafka engines, and enhanced Buffer engines—that enable reliable, high‑throughput data processing at massive scale.

ByteDance Data Platform

Jan 17, 2022

How ByteHouse Scales Real‑Time Analytics on ClickHouse: Challenges & Solutions

ByteHouse, built on ClickHouse, addresses the difficulty of adopting open‑source analytics and the high cost of trial‑and‑error by providing a commercial product with technical support.

01 – Technical Selection

Among internal engines (ClickHouse, Druid, Elasticsearch, Kylin), ClickHouse was chosen because it offers fast, low‑latency observation of algorithm models, supports both aggregation and detailed queries, provides a Map type for dynamic dimensions, and includes native Bloom filter support.

Solution Comparison

Two implementation paths were considered; the final choice was the ClickHouse Kafka Engine , which consumes Kafka topics directly into ClickHouse tables.

Final Architecture and Effects

Data from the recommendation system is written to Kafka, then consumed by ClickHouse via the Kafka Engine. The BI platform was adapted for interactive queries, and a 1% sample of offline data is periodically imported for validation.

02 – Challenges and Solutions

Problem 1: Insufficient Write Throughput

Heavy auxiliary jump‑index construction slowed writes.

Solution – Asynchronous Index Building

The index construction is moved to a background queue after column data is written, improving write throughput by roughly 20%.

Problem 2: Kafka Consumption Bottleneck

The community Kafka table uses a single consumer thread, limiting performance.

Solution – Multi‑Threaded Consumption

Kafka Engine was modified to spawn multiple consumer threads, each handling its own data parsing and insertion, achieving near‑linear scaling of write performance.

Problem 3: Data Integrity in Failover Scenarios

In replicated mode, simultaneous writes to both nodes could cause data loss or duplication after a node failure.

Solution – Leader‑Only Consumption

Using ZooKeeper‑based leader election, only one replica consumes data; the other remains standby, ensuring consistent data and query routing.

Second Use Case – Real‑Time Advertising Data

The original Druid‑based system struggled with multi‑day data and lacked certain capabilities.

ClickHouse was adopted, but new issues arose:

Problem 1: Buffer Engine Incompatibility with ReplicatedMergeTree

Buffer Engine could not be used alongside ReplicatedMergeTree, leading to inconsistent queries.

Solution

Combine Kafka, Buffer, and MergeTree tables for a unified interface.

Integrate Buffer into Kafka Engine as an optional component.

Implement a pipeline‑style processing of multiple Blocks within Buffer.

Enable consistent queries under ReplicatedMergeTree.

Problem 2: Data Loss or Duplicate Consumption After Crash

ClickHouse lacks transaction support, so partial writes could cause loss or duplication.

Solution

Adopt a Druid‑style KIS approach: bind Kafka offsets with ClickHouse Parts and write them atomically within a transaction, rolling back both offset and data on failure.

Result

Atomic inserts and stable consumption were achieved, enhancing reliability for large‑scale real‑time analytics.

Overall, ByteHouse leverages extensive ClickHouse experience to deliver a high‑performance, scalable analytics platform for enterprise big‑data workloads.

Real-time analytics Kafka ClickHouse ByteHouse

Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.