ByConity Replaces ClickHouse for OLAP, Cutting Resource Costs Over 50%
MetaApp replaced ClickHouse with the open‑source cloud‑native warehouse ByConity, achieving more than 50% reduction in resource costs while delivering comparable or faster OLAP query performance across distinct, retention, conversion, and point‑lookup workloads, thanks to compute‑storage separation, read/write isolation, and minute‑level elastic scaling.
Background and Motivation
MetaApp needed an OLAP platform that supports real‑time queries, AB testing, low latency, and self‑service analytics. Their existing ClickHouse deployment suffered from read/write resource contention, long scaling cycles, and high operational cost.
Problems with ClickHouse
Read/write integration leads to resource抢占, causing query latency spikes during write bursts.
Scaling (adding/removing nodes) takes 1‑2 weeks and requires data redistribution.
Operational complexity: node failures cause hours‑long delays, peak‑time resource shortage, idle resources in off‑peak, and no cloud integration.
Why ByConity
ByConity is an open‑source cloud‑native data warehouse built on the ClickHouse kernel, adding compute‑storage separation, read/write separation, elastic scaling, and strong consistency.
Write throughput 50‑200 MB/s.
Query throughput 2‑30 GB/s.
Compression ratio 0.2‑0.3.
Architecture Overview
MetaApp’s platform consists of an offline pipeline (DataX → Hive → Superset) and a real‑time pipeline (GoSink → ClickHouse, CnchKafka → ByConity). The OLAP query layer can query both ClickHouse and ByConity clusters.
Deployment Details
ByConity runs on S3 + Kubernetes. A scheduled scaling policy expands resources at 10 am and shrinks at 8 pm on workdays, reducing active resource time to ~10 hours per day.
8‑worker cluster: 120 CPU cores, 880 GB memory.
16‑worker cluster: 240 CPU cores, 1760 GB memory.
Benchmark Results
Test dataset: 400 billion rows (40 billion per day), 2800 columns.
Resource usage:
ClickHouse cluster: 400 CPU cores, 2560 GB memory.
ByConity 8‑worker: 120 CPU cores, 880 GB memory.
ByConity 16‑worker: 240 CPU cores, 1760 GB memory.
Key findings (average values):
Typical OLAP queries (distinct, retention, conversion, point look‑ups) achieve ClickHouse‑level latency with 120 C/880 G, and double the speed with 240 C/1760 G.
‘not in’ filters need moderate resources (240 C/1760 G) to match ClickHouse.
Bitmap queries require larger resources to approach ClickHouse performance.
Detailed query categories:
Distinct Queries
Optimization toggle has little impact.
8‑worker cluster matches ClickHouse latency.
Scaling workers further reduces latency.
Retention Calculations
Optimized ByConity runs at 33 % of unoptimized time.
8‑worker optimized runs at ~30 % of ClickHouse time.
With more resources and optimization, latency can be reduced to 16 % of ClickHouse.
Conversion Calculations
Optimized ByConity uses 60 % of unoptimized time.
8‑worker optimized matches ClickHouse.
Further scaling can reach 53 % of ClickHouse latency.
‘not in’ Filters
Optimized mode performs worse; unoptimized mode is preferred.
8‑worker unoptimized is slightly slower than ClickHouse.
Scaling to 16 workers can achieve 86 % of ClickHouse speed.
Point Look‑ups
Optimized ByConity outperforms unoptimized.
8‑worker unoptimized is comparable to ClickHouse.
Scaling can reduce latency to 26 % of ClickHouse.
Bitmap Filters
Optimized mode slightly better.
8‑worker unoptimized is much slower than ClickHouse.
Even 16‑worker configuration remains slower than ClickHouse.
Operational Benefits After Full Migration
CPU consumption for query merging reduced by ~75 %.
CPU consumption for data ingestion reduced by ~35 %.
Fixed‑resource purchase cut by ~50 %; elastic usage further lowers cost by ~25 %.
Overall CPU and memory share become 34 % and 48 % of ClickHouse respectively; with remote storage they are 48 % and 68 %.
Simplified write path and faster peak‑time scaling (add pods, no long‑running queries).
Migration Recommendations
Validate SQL compatibility on a test ByConity cluster.
Run side‑by‑side workloads, compare results and resource consumption.
Estimate S3/HDFS storage, bandwidth, and QPS requirements for full migration.
Perform dual‑run migration, gradually shifting workloads to ByConity.
Decommission ClickHouse once dual‑run is stable.
Future Plans
Test and roll out ByConity data‑lake solutions, integrate data‑metric management, and aim to serve 80 % of queries from the warehouse.
GitHub: https://github.com/ByConity/ByConity
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
