How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics
ByteDance’s ByteHouse, an enterprise‑grade ClickHouse, powers real‑time recommendation and ad‑delivery analytics at massive scale, detailing two case studies, technical selections, architectural designs, and performance optimizations such as asynchronous indexing, multi‑threaded Kafka consumption, and enhanced buffer engines to ensure data integrity.
Recommendation System Real‑Time Metrics
ByteDance’s internal A/B testing requires real‑time feedback beyond the offline T+1 metrics. The system must query both aggregated and detailed data, support hundreds of dimensions, filter efficiently by ID, and compute machine‑learning metrics such as AUC.
Technology Choice
After evaluating ClickHouse, Druid, Elasticsearch, and Kylin, ClickHouse was selected for its low latency, ability to handle both aggregation and point queries, support for Map types, dynamic dimensions, Bloom‑filter indexing, and extensibility via UDFs.
Solution Evaluation
The final architecture uses ClickHouse’s built‑in Kafka Engine to consume recommendation data directly from Kafka, with Kafka topics formatted to match ClickHouse table schemas. Data can also be imported from Hive for validation, and a small sample of offline data is retained for further checks.
Challenges and Optimizations
Problem 1: Write throughput – Index construction slowed writes; solved by asynchronous index building, boosting throughput ~20%.
Problem 2: Kafka consumption – Single‑consumer limitation; resolved by enabling multi‑threaded consumption, achieving near‑linear write performance.
Problem 3: Data integrity under failover – Dual‑node writes could cause inconsistencies; improved Kafka Engine uses ReplicatedMergeTree leader election via ZooKeeper to ensure only one node consumes, preserving consistency.
Ad Delivery Real‑Time Data
Advertising teams need immediate visibility of campaign performance, often involving multi‑day data. The previous Druid‑based solution faced limitations, prompting migration to ClickHouse.
Issues
Buffer Engine could not be combined with ReplicatedMergeTree, leading to inconsistent queries across replicas.
Solution
Integrated Kafka, Buffer, and MergeTree tables; embedded Buffer Engine into Kafka Engine with a toggleable option; implemented pipeline‑style block processing and ensured query consistency under ReplicatedMergeTree.
Result: Resolved consistency issues when using Buffer Engine with ReplicatedMergeTree.
Additional Challenges
Problem: Lack of transaction support could cause data loss or duplicate consumption after crashes.
Solution: Adopted a Druid‑inspired KIS approach binding Kafka offsets with parts, writing them atomically within a transaction to guarantee consistency.
Conclusion
Real‑time analytics is ClickHouse’s strength. ByteDance’s optimizations have been incorporated into ByteHouse, offering enterprises a robust platform for large‑scale, interactive data analysis and sharing best practices with the ClickHouse community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
