Databases 10 min read

How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics

ByteDance’s ByteHouse, an enterprise‑grade ClickHouse, powers real‑time recommendation and ad‑delivery analytics at massive scale, detailing two case studies, technical selections, architectural designs, and performance optimizations such as asynchronous indexing, multi‑threaded Kafka consumption, and enhanced buffer engines to ensure data integrity.

Volcano Engine Developer Services

Sep 6, 2021

How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics

Recommendation System Real‑Time Metrics

ByteDance’s internal A/B testing requires real‑time feedback beyond the offline T+1 metrics. The system must query both aggregated and detailed data, support hundreds of dimensions, filter efficiently by ID, and compute machine‑learning metrics such as AUC.

Technology Choice

After evaluating ClickHouse, Druid, Elasticsearch, and Kylin, ClickHouse was selected for its low latency, ability to handle both aggregation and point queries, support for Map types, dynamic dimensions, Bloom‑filter indexing, and extensibility via UDFs.

Solution Evaluation

The final architecture uses ClickHouse’s built‑in Kafka Engine to consume recommendation data directly from Kafka, with Kafka topics formatted to match ClickHouse table schemas. Data can also be imported from Hive for validation, and a small sample of offline data is retained for further checks.

Challenges and Optimizations

Problem 1: Write throughput – Index construction slowed writes; solved by asynchronous index building, boosting throughput ~20%.

Problem 2: Kafka consumption – Single‑consumer limitation; resolved by enabling multi‑threaded consumption, achieving near‑linear write performance.

Problem 3: Data integrity under failover – Dual‑node writes could cause inconsistencies; improved Kafka Engine uses ReplicatedMergeTree leader election via ZooKeeper to ensure only one node consumes, preserving consistency.

Ad Delivery Real‑Time Data

Advertising teams need immediate visibility of campaign performance, often involving multi‑day data. The previous Druid‑based solution faced limitations, prompting migration to ClickHouse.

Issues

Buffer Engine could not be combined with ReplicatedMergeTree, leading to inconsistent queries across replicas.

Solution

Integrated Kafka, Buffer, and MergeTree tables; embedded Buffer Engine into Kafka Engine with a toggleable option; implemented pipeline‑style block processing and ensured query consistency under ReplicatedMergeTree.

Result: Resolved consistency issues when using Buffer Engine with ReplicatedMergeTree.

Additional Challenges

Problem: Lack of transaction support could cause data loss or duplicate consumption after crashes.

Solution: Adopted a Druid‑inspired KIS approach binding Kafka offsets with parts, writing them atomically within a transaction to guarantee consistency.

Conclusion

Real‑time analytics is ClickHouse’s strength. ByteDance’s optimizations have been incorporated into ByteHouse, offering enterprises a robust platform for large‑scale, interactive data analysis and sharing best practices with the ClickHouse community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data real-time analytics Kafka ClickHouse ByteHouse

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.