BaikalDB Implementation Practice at Tongcheng Yilong: High Availability, HTAP, Performance and Cost Optimization
Tongcheng Yilong’s BaikalDB deployment combines high‑availability multi‑Raft HA, HTAP support, and share‑nothing scalability to deliver over 72K TPS OLTP and ten‑fold faster OLAP queries while cutting operational costs up to a hundredfold through dual‑center, columnar storage and cloud‑native elasticity.
This article introduces the complete implementation practice of BaikalDB at Tongcheng Yilong, summarizing six core features: high availability with HTAP capabilities, high performance and scalability, and low-cost operation.
The authors began researching BaikalDB in 2019 to solve practical issues including slow OLAP queries on row-based databases, incomplete cross-center high availability solutions, and the need for columnar storage capabilities. After six months of research and practice, they deployed column-based OLAP business, row-based OLTP business, and dual-center high availability configurations.
Architecture Overview: BaikalDB consists of three components: BaikalStore (data storage using Region organization with Raft group for three replicas), BaikalMeta (metadata management including partitioning, capacity, permissions, and load balancing), and BaikalDB (SQL parsing and query execution, stateless deployment).
Core Features: Strong consistency with Read Committed distributed transactions, high availability via Multi Raft protocol (RTO=0, RPO<30s), high scalability with share-nothing architecture, high performance (QPS>10K for point queries with P95<100ms), and MySQL 5.6 protocol compatibility.
Performance Benchmarks: For OLTP scenarios using row storage, read capacity reached 72K+ TPS (85% higher than MySQL), write performance reached 9.6K+ TPS (85%-120% of MySQL). For OLAP scenarios using column storage, query performance improved 10x compared to row-based storage.
Cost Optimization: The dual-center deployment with HTAP capabilities achieves theoretical 100x cost reduction through: HTAP integration (5x), resource efficiency improvements (4x), and cloud-native elastic capabilities (5x).
Key configuration parameters include: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=209715200 , bthread_concurrency=200 , max_background_jobs=24 , cache_size=64M .
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.