How MetaApp Cut Data Warehouse Costs by 50% with ByConity
MetaApp replaced ClickHouse with the open‑source cloud‑native data warehouse ByConity, achieving over 50% cost reduction and faster, more stable OLAP queries by separating storage and compute, simplifying scaling, and improving resource utilization across a range of analytics workloads such as deduplication, retention, conversion and point‑lookup.
MetaApp OLAP Data Analysis Platform Architecture and Features
MetaApp is a leading Chinese mobile game developer and operator with over 200 million registered users. To improve real‑time analytics, they replaced ClickHouse with the open‑source cloud‑native data warehouse ByConity, which separates storage and compute.
ByConity’s architecture consists of an offline pipeline (DataX → Kafka → Hive → BI reports via Superset) and a real‑time pipeline (GoSink → ClickHouse and CnchKafka → ByConity). The platform supports event analysis, conversion analysis, custom retention, user segmentation and flow analysis.
ByConity vs ClickHouse Feature Comparison
Write speed 50‑200 MB/s, query speed 2‑30 GB/s, compression ratio 0.2‑0.3.
Enhanced read/write separation, elastic scaling, strong consistency.
Suitable for massive tables, many columns, column‑subset queries, sub‑second latency.
Problems with ClickHouse
1. Read/write contention during peak traffic. 2. Lengthy scaling cycles (1‑2 weeks) and need for data redistribution. 3. Complex operations causing SLA violations and high idle costs.
Improvements after Switching to ByConity
• Isolated read/write resources ensure stable performance; resources can be expanded on demand, including cloud resources. • Scaling now takes minutes; no data reshuffling required because storage (HDFS/S3) is separate. • Cloud‑native deployment simplifies operations.
CPU usage for query merging reduced ~75%.
CPU for data ingestion reduced ~35%.
Overall resource cost lowered 40‑50% by using elastic daytime scaling.
Performance Evaluation
Test dataset: 400 billion rows (40 billion per day) across 2 800 columns. ClickHouse cluster used 400 CPU cores and 2 560 GB RAM; ByConity 8‑worker cluster used 120 cores and 880 GB RAM, achieving comparable query times. Doubling workers roughly doubled speed.
Key query types:
Deduplication, retention, conversion, point‑lookup: 8‑worker ByConity matches ClickHouse with half the resources; 16‑worker doubles speed.
‘not in’ filters: 8‑worker ByConity slightly slower; scaling to 16‑worker reaches ~86% of ClickHouse speed.
Bitmap queries: require more resources; 16‑worker still slower than ClickHouse.
Migration Recommendations
Validate SQL compatibility on a test ByConity cluster.
Compare resource consumption on representative datasets.
Run dual pipelines during transition, then retire ClickHouse.
Plan for S3/HDFS bandwidth and QPS requirements.
Configure sufficient cache on default nodes to reduce cold‑start latency.
Future Plans
MetaApp will continue testing ByConity’s data‑lake solution and aim to move 80 % of queries to the warehouse.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
