Databases 26 min read

ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance

This article compares ByConity and ClickHouse from a usage perspective, detailing their architectural differences, core components, basic operations such as table creation, data import and query, distributed transaction support, special table engines, scaling strategies, and deployment requirements.

dbaplus Community

Oct 25, 2023

ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance

Architecture Overview

Both ClickHouse and ByConity are column‑oriented OLAP databases, but they adopt different architectural models.

ClickHouse Architecture

ClickHouse follows a classic MPP (Massively Parallel Processing) design where every node runs a full ClickHouse server . The main components are:

Distributed tables & local tables – A client queries a distributed table; the server resolves the sharding key, forwards the query to the corresponding local tables on each shard, aggregates the partial results, and returns the final result.

Replicas – Replication is configured per local table. A coordinator distributes tasks and handles inter‑node communication.

Zookeeper / ClickHouse Keeper – Historically Zookeeper provided coordination (metadata, leader election). ClickHouse Keeper replaces Zookeeper with a C++ Raft‑based service that can be deployed as a standalone cluster or embedded in the server.

ByConity Architecture

ByConity separates compute from storage and introduces three logical layers:

Service‑access layer – Consists of a front‑end server and shared services (TSO, Daemon Manager, Resource Manager). The server parses SQL, builds a PlanSegment graph and dispatches it to workers. Metadata (catalog, table definitions, statistics) is stored in FoundationDB .

Compute layer – One or more Virtual Warehouses (a group of workers). Workers may keep a local disk cache to accelerate hot reads; cold reads pull data from the cloud storage layer.

Cloud‑storage layer – Persistent data and indexes reside in object stores such as HDFS or S3.

ByConity is designed for Kubernetes orchestration; shared services have modest resource footprints, while servers and workers require more CPU and memory. External dependencies include an HDFS cluster (or S3) and a FoundationDB cluster.

Database Operations

Table Creation

ClickHouse requires an explicit MergeTree family engine (e.g., MergeTree, ReplacingMergeTree, AggregatingMergeTree) for each table.

ByConity provides a unified engine CnchMergeTree that encapsulates the full MergeTree feature set, including unique‑key support. The DDL can also specify a default Virtual Warehouse to achieve read/write separation.

Data Import

Both systems support:

Standard INSERT statements.

File‑based INSERT … INFILE.

External table engines for bulk loading.

ClickHouse offers a dedicated Kafka engine for streaming ingestion. ByConity mirrors these capabilities and adds a Spark‑compatible tool called PartWriter that writes data directly as ByConity parts, bypassing the table engine.

Real‑time Kafka Consumption

ClickHouse uses a high‑level consumer that automatically rebalances partitions across shards. This can cause duplicate consumption and makes exactly‑once guarantees difficult.

ByConity adopts a low‑level static assignment model (Kafka assign) where the server explicitly maps partitions to workers. This enables exactly‑once semantics and simplifies debugging.

Query Execution

ClickHouse performs two‑stage aggregation (distributed → local) and provides GLOBAL JOIN and GLOBAL IN. Complex multi‑table joins are a known limitation.

ByConity integrates a cost‑based optimizer (CBO) that leverages collected statistics ( CREATE STATS, DROP STATS) to generate multi‑stage execution plans, improving performance for complex queries.

Distributed Transactions

Transactional guarantees are essential for incremental OLAP loads.

ClickHouse only supports local transactions for a single INSERT batch limited by max_insert_block_size. Distributed transaction support is experimental (MVCC, RC isolation) and requires special configuration.

ByConity implements full ACID‑style distributed transactions:

Atomicity – All changes are committed or rolled back atomically, even after power loss.

Consistency – State transitions always move from one valid state to another.

Isolation – Provides READ COMMITTED; uncommitted writes are invisible to other transactions.

Durability – Committed data is stored in highly available distributed file systems or object stores.

The implementation relies on two core services:

FoundationDB – Supplies atomic compare‑and‑swap primitives and a reliable key‑value store for transaction metadata.

Timestamp Oracle (TSO) – Generates globally unique, monotonically increasing timestamps used as transaction IDs.

ByConity follows a classic two‑phase commit: the first phase writes undo logs and metadata; the second phase finalizes the commit and cleans up on failure.

Special Table Engines

Unique Engine

To provide true upsert semantics, ByConity converts row‑level updates into DELETE + INSERT using a DeleteBitmap. A unique‑key index maps keys to row numbers for fast lookup, and row‑level locking prevents write conflicts.

Bucket Engine

The bucket engine partitions a large table into a configurable number of buckets, improving parallelism and enabling co‑located joins. Example DDL:

CREATE TABLE t(
    ...
) ENGINE = CnchMergeTree()
CLUSTER BY (user_id, event_date) INTO 32 BUCKETS;

Changing the bucket count requires data reshuffling and should be done sparingly. Recommended bucket count is a multiple of the number of workers (e.g., 1× or 2×).

Multi‑Catalog Data Lake Support

ClickHouse reads external tables (Hive, Hudi, Iceberg) as local tables, which limits performance. ByConity introduces a unified Multi‑Catalog architecture that allows a single Hive metastore to connect to multiple storage back‑ends.

Example to create an external catalog for Hive on S3:

CREATE EXTERNAL CATALOG hive_s3
PROPERTIES type='hive', hive.metastore.uri='thrift://localhost:9083';

Querying an external table: SELECT * FROM hive_s3.tpcds.call_center; ByConity also provides dedicated engines CnchHive and CnchHudi that integrate with the optimizer, automatically collect statistics, and achieve high performance on TPC‑DS benchmarks.

Scaling and Elasticity

ClickHouse does not have built‑in elastic scaling; operators must manually add replicas or shards and handle data rebalancing.

ByConity offers seamless elastic scaling via Virtual Warehouses. Adding or removing workers is “no‑pain” because the scheduler discovers new workers and automatically rebalances data. Resource isolation (tenant, read/write, cold/hot) is achieved by deploying separate Virtual Warehouses per business line or workload.

Reference

For source code, issue tracking, and release artifacts see the ByConity GitHub organization:

https://github.com/ByConity

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ClickHouse distributed transactions scaling data import Table Engines ByConity

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.