Backend Development 15 min read

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

ByteHouse evolved its real‑time import pipeline from a community ClickHouse architecture to a custom HaKafka engine and a cloud‑native design, addressing node failures, read‑write conflicts, scaling costs, and latency by introducing two‑level concurrency, memory tables, exactly‑once semantics, and robust fault‑tolerance.

Volcano Engine Developer Services

Mar 29, 2023

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

Internal Business Real‑Time Import Demand

ByteHouse's real‑time import technology originated from internal ByteDance business needs. Kafka is the primary source; users prioritize import performance, service stability, scalability, and second‑level data visibility latency.

High Availability in Distributed Architecture

ByteHouse initially adopted ClickHouse's community distributed architecture, which has three inherent drawbacks:

Node failures : Large clusters require weekly manual node‑failure handling; single‑replica clusters can lose data in extreme cases.

Read‑write conflicts : Under high load, query and import compete for CPU and I/O, causing consumption lag.

Scaling cost : Data stored locally makes reshuffling impossible; new nodes start empty while old nodes become overloaded, reducing scaling effectiveness.

These trade‑offs stem from the architecture's native concurrency and local‑disk performance.

Community Native Distributed Architecture

The community design uses a high‑level Kafka consumption model with automatic rebalancing, which cannot satisfy certain advanced requirements.

Community Real‑Time Import Design

High‑Level consumption mode : Relies on Kafka rebalance for load balancing.

Two‑level concurrency : Shard‑level multi‑process parallelism and intra‑shard multi‑threaded consumption.

Batch‑write : Data is accumulated by size or time before a single write, improving throughput and reducing merge pressure.

Unmet Requirements

Advanced users need key‑based data placement (e.g., unique keys) which the high‑level model cannot guarantee.

Rebalance is uncontrollable, leading to uneven data distribution across shards.

Debugging consumption anomalies is difficult in large‑scale deployments.

Self‑Developed Distributed Consumption Engine HaKafka

To meet these needs, ByteHouse built HaKafka, a high‑availability consumption engine.

High Availability (Ha) : Each shard may have multiple replicas; Zookeeper elects a leader to perform consumption. If the leader fails, Zookeeper promotes a standby replica within seconds.

Low‑Level consumption mode : Guarantees ordered, balanced partition assignment to shards, and uses multi‑threaded consumption within each shard, preserving the two‑level concurrency benefits.

With Low‑Level mode, as long as upstream producers avoid data skew, imported data is evenly distributed across shards. Users can also enforce key‑based placement by routing the same key to the same Kafka partition.

Scenario 1: Leader Failure

In a dual‑replica shard, both replicas host identical HaKafka tables in Ready state. If the leader crashes, the standby replica is elected as the new leader, ensuring continuous consumption.

Scenario 2: Node Replacement

When a node fails, the replacement process involves copying data. HaKafka avoids selecting a node undergoing replacement as leader, ensuring the healthy replica continues serving traffic.

Import Performance Optimization: Memory Table

For wide tables with hundreds of columns or thousands of map‑keys, writing each column to disk creates many small files, stressing I/O and merge processes. Memory Table buffers incoming data in memory and flushes to disk in larger batches, reducing I/O and improving import speed by up to three times while still supporting queries.

Cloud‑Native New Architecture

To overcome the inherent drawbacks of the distributed design, ByteHouse adopted a cloud‑native architecture (ByConity) in early 2021, open‑sourced in 2023. This architecture provides automatic fault tolerance, lightweight scaling, and separates storage (cloud) from compute, improving data safety and stability at the cost of some read/write latency.

Real‑Time Import Design Based on Cloud‑Native Architecture

The architecture consists of three layers:

Cloud Service : Server and Catalog components act as the entry point, handling request preprocessing and metadata lookup.

Virtual Warehouse : Execution layer with isolated warehouses; “Default” for queries, “Write” for imports, achieving read‑write separation.

VFS : Underlying storage supporting HDFS, S3, AWS, etc.

Server creates a Manager for each consumption table, which schedules tasks to Virtual Warehouse nodes. The Low‑Level consumption mode distributes Kafka partitions evenly across tasks.

New Consumption Execution Flow

Each task initiates a transaction via RPC, polls Kafka, converts blocks to Parts, dumps them to VFS (invisible), then commits the transaction, making data visible.

Fault‑Tolerance Guarantees

Manager periodically heartbeats tasks via RPC; tasks also validate their own health through transaction RPCs. If a task fails, the Manager quickly launches a replacement, achieving second‑level fault tolerance.

Consumption Capacity

Task count is configurable up to the number of Kafka partitions; additional nodes can be added lightly to handle load, and a Resource Manager balances task distribution.

Semantic Enhancement: Exactly‑Once

Cloud‑native transactions enable atomic commit of both the data Part and its Kafka offset, upgrading semantics from At‑Least‑Once to Exactly‑Once.

Memory Buffer

Beyond the Memory Table, a generic Memory Buffer caches imported data before flushing to storage. Combined with a Write‑Ahead Log (WAL), data is considered successfully imported once written to WAL; upon service restart, WAL can recover unflushed data, and the buffer later persists to disk.

Business Applications and Future Thoughts

ByteHouse processes petabyte‑scale daily Kafka streams, achieving 10‑20 MiB/s per consumer thread. It also supports RocketMQ, Pulsar, MySQL (MaterializedMySQL), and direct Flink writes. Future directions include more generic import pipelines for diverse sources and balancing data visibility latency with performance.

distributed-systems cloud-native high availability Kafka memory table real-time ingestion

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.