Databases 18 min read

How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse

This article explores ByteHouse’s evolution from ClickHouse within ByteDance, detailing the challenges of scaling to over 18,000 nodes, the architectural redesign for cloud‑native elasticity, high‑availability innovations, and the product’s roadmap toward a Snowflake‑like, multi‑tenant data warehouse solution.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse

ClickHouse Origins and ByteDance Adoption

ClickHouse, open‑sourced in 2016, quickly gained traction in the analytical database space due to its performance. ByteDance became a deep user, operating the largest ClickHouse cluster in China with more than 18,000 nodes and over 700 PB of data by early 2022, primarily supporting user growth analysis.

Despite its speed, ClickHouse’s scalability and usability posed challenges as the cluster grew, especially the lack of elastic scaling.

From ClickHouse to ByteHouse

To meet internal business needs, ByteDance heavily invested in extending ClickHouse, eventually releasing the commercial product ByteHouse on the Volcano Engine platform.

Early adoption began in 2017 for user growth analytics, a critical real‑time analysis workload. The team evaluated several OLAP engines (Kylin, Druid, Spark) before selecting ClickHouse for its strong performance.

Initial development focused on providing basic capabilities, data lifecycle management, and SQL‑based metric computation enhancements.

Scaling Challenges and Architectural Re‑design

As usage expanded to BI, A/B testing, and model forecasting, the cluster grew by an order of magnitude, exposing storage limits, write‑induced query degradation, and high‑availability bottlenecks in the native ReplicatedMergeTree (ZooKeeper) replication.

The team introduced hot‑cold tiered storage, externalized data ingestion services, and a custom high‑availability layer (HaMergeTree) that reduced ZooKeeper load and improved fault‑tolerance.

They also built a distributed KV‑based metadata service, integrated shared storage with custom data formats, added ACID transaction support, and implemented distributed joins and a self‑developed optimizer.

Dynamic resource management, containerized deployment on Kubernetes, and multi‑tenant isolation were added to achieve cloud‑native elasticity.

Performance Optimizations in the Cloud‑Native Era

To mitigate latency from longer data paths, the team introduced multi‑level caching, compact data formats, and an abstraction layer for various storage backends.

They also implemented distributed transactions, fine‑grained locking, and MVCC to ensure consistency while maintaining high concurrency.

Productization and Market Positioning

ByteHouse now matches ClickHouse’s performance while offering higher resource utilization, lower operational cost, and cloud‑native features. It is positioned as a Snowflake‑like, cloud‑agnostic data warehouse, supporting both internal and external customers.

Since August 2021, ByteHouse has been publicly available, with internal migration of major analytics workloads underway, and plans to expand SaaS offerings across multiple cloud providers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ClickHouseData WarehouseDatabase EngineeringByteHouse
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.