How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse
This article explores ByteHouse’s evolution from ClickHouse within ByteDance, detailing the challenges of scaling to over 18,000 nodes, the architectural redesign for cloud‑native elasticity, high‑availability innovations, and the product’s roadmap toward a Snowflake‑like, multi‑tenant data warehouse solution.
ClickHouse Origins and ByteDance Adoption
ClickHouse, open‑sourced in 2016, quickly gained traction in the analytical database space due to its performance. ByteDance became a deep user, operating the largest ClickHouse cluster in China with more than 18,000 nodes and over 700 PB of data by early 2022, primarily supporting user growth analysis.
Despite its speed, ClickHouse’s scalability and usability posed challenges as the cluster grew, especially the lack of elastic scaling.
From ClickHouse to ByteHouse
To meet internal business needs, ByteDance heavily invested in extending ClickHouse, eventually releasing the commercial product ByteHouse on the Volcano Engine platform.
Early adoption began in 2017 for user growth analytics, a critical real‑time analysis workload. The team evaluated several OLAP engines (Kylin, Druid, Spark) before selecting ClickHouse for its strong performance.
Initial development focused on providing basic capabilities, data lifecycle management, and SQL‑based metric computation enhancements.
Scaling Challenges and Architectural Re‑design
As usage expanded to BI, A/B testing, and model forecasting, the cluster grew by an order of magnitude, exposing storage limits, write‑induced query degradation, and high‑availability bottlenecks in the native ReplicatedMergeTree (ZooKeeper) replication.
The team introduced hot‑cold tiered storage, externalized data ingestion services, and a custom high‑availability layer (HaMergeTree) that reduced ZooKeeper load and improved fault‑tolerance.
They also built a distributed KV‑based metadata service, integrated shared storage with custom data formats, added ACID transaction support, and implemented distributed joins and a self‑developed optimizer.
Dynamic resource management, containerized deployment on Kubernetes, and multi‑tenant isolation were added to achieve cloud‑native elasticity.
Performance Optimizations in the Cloud‑Native Era
To mitigate latency from longer data paths, the team introduced multi‑level caching, compact data formats, and an abstraction layer for various storage backends.
They also implemented distributed transactions, fine‑grained locking, and MVCC to ensure consistency while maintaining high concurrency.
Productization and Market Positioning
ByteHouse now matches ClickHouse’s performance while offering higher resource utilization, lower operational cost, and cloud‑native features. It is positioned as a Snowflake‑like, cloud‑agnostic data warehouse, supporting both internal and external customers.
Since August 2021, ByteHouse has been publicly available, with internal migration of major analytics workloads underway, and plans to expand SaaS offerings across multiple cloud providers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
