ByteHouse: ClickHouse Enterprise Edition Case Studies and Optimizations at ByteDance
ByteDance’s ByteHouse, a ClickHouse enterprise edition, showcases large‑scale real‑time analytics through two detailed case studies—recommendation system metrics and ad‑delivery data—detailing technical selection, challenges, multi‑threaded Kafka Engine, async indexing, buffer engine enhancements, and the resulting performance gains.
ByteDance’s enterprise technology platform Volcano Engine recently launched ByteHouse, a ClickHouse enterprise edition designed to lower the barrier of entry and provide commercial support.
ByteDance operates over 15,000 ClickHouse nodes storing more than 600 PB of data, with the largest clusters exceeding 2,400 nodes, making ClickHouse the backbone of many growth‑analysis workloads.
Case Study 1 – Real‑time Recommendation Metrics : The A/B testing platform required sub‑second feedback on algorithm changes, demanding simultaneous aggregation and detail queries across hundreds of dimensions, efficient ID filtering, and support for statistical metrics such as AUC.
Technical Selection : After evaluating ClickHouse, Druid, Elasticsearch, and Kylin, ClickHouse was chosen for its fast model observation, strong aggregation and point‑query performance, support for Map types, dynamic dimensions, Bloom‑filter indexing, and extensibility via UDFs.
Solution Evaluation : The final architecture uses ClickHouse’s Kafka Engine to ingest data directly from the recommendation system, with Kafka topics matching ClickHouse table schemas, and a fallback Hive import for offline validation.
Problem 1 – Write Throughput : Index construction slowed writes; the solution was to build indexes asynchronously, improving throughput by ~20%.
Problem 2 – Kafka Consumption : The default single‑consumer model limited throughput; a multi‑threaded consumer implementation was added, achieving near‑linear write performance gains.
Problem 3 – Data Integrity in Failover : In master‑slave mode, simultaneous writes could cause inconsistencies; a replicated‑merge‑tree‑based leader election via ZooKeeper ensures only one node consumes, preserving data integrity.
Case Study 2 – Real‑time Advertising Data : Previously built on Druid, the workload faced challenges that ClickHouse resolved, though new issues arose.
Problem 1 – Buffer Engine & ReplicatedMergeTree : Incompatibility caused inconsistent queries; the solution combined Kafka, Buffer, and MergeTree tables, integrated Buffer into the Kafka Engine, and added logic to read from the replica with data.
Problem 2 – Crash‑Induced Data Loss or Duplicate Consumption : Lack of transaction support risked partial writes; the solution bound Kafka offsets with ClickHouse parts in a single transaction, ensuring atomicity and stability.
Effect : These optimizations enhanced query consistency, eliminated data loss, and provided robust real‑time analytics at massive scale.
In summary, ByteHouse leverages ByteDance’s extensive ClickHouse experience to deliver a high‑performance, enterprise‑grade analytics platform, continuously incorporating best practices and community contributions to advance large‑scale data processing.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
