How BaikalDB’s Columnar Storage Boosted Real‑Time Analytics at DTCC2020
This article details how the DTCC2020 guest speaker from Tongcheng‑Elong introduced BaikalDB’s distributed columnar storage, covering internal and external motivations, technology comparison, architecture, implementation tricks, performance gains in production, and future hybrid row‑column research directions.
Background
Rapid growth of business workloads required a native distributed database that could scale horizontally, provide real‑time OLAP on wide tables, simplify operations (online scaling, dynamic column and index addition), support active‑active dual‑center deployment, and expose cloud‑native characteristics while offering both OLTP and OLAP capabilities to avoid data duplication.
Industry trends toward NewSQL, distributed transaction protocols, and the convergence of AI, big data, and cloud‑native architectures further motivated the exploration of a columnar storage engine.
Technology Evaluation and Selection
Core Comparison
Several NewSQL and column‑oriented databases were benchmarked for latency, scalability, and feature completeness. The comparison highlighted trade‑offs between row‑store HTAP systems, pure columnar stores, and hybrid designs.
Chosen Platform
BaikalDB, an open‑source NewSQL database, satisfied most functional requirements. The missing piece—native columnar storage—was developed in‑house and contributed back to the community.
Architecture Overview
Basic Architecture
Core Features
Storage Model
Columnar Engine Implementation
Advantages of Columnar Layout
Reduced I/O – only the columns required by a query are read or written.
Higher compression – homogeneous column values enable dictionary‑based compression with ratios far exceeding row‑store compression.
Cache‑friendly – lower I/O improves CPU cache utilization.
Vectorized execution – processing data in blocks (vectors) improves cache locality and instruction‑level parallelism.
Late materialization – columns are materialized only after filters have been applied, avoiding unnecessary scans.
Design Changes
The KV layer’s encoding was rewritten so that values are stored column‑wise instead of row‑wise. This required:
Modifying the on‑disk layout to interleave column values while preserving the original transaction semantics.
Extending the transaction, scan, filter, and compression modules to operate on column vectors.
Related Optimizations
I/O reduction / late materialization : column‑wise reads avoid full‑row fetches; updates affect only the modified columns.
ZSTD compression : a dictionary‑based ZSTD codec was integrated, providing up to 3‑5× better compression on columnar data.
Parallelism : BaikalDB region size and partition count were tuned to increase the number of concurrent columnar scan workers.
Vectorized query engine : the traditional volcano iterator was replaced by a vector engine that processes batches of rows in a single call, reducing function‑call overhead.
RocksDB read tuning : parameters such as write_buffer_number, level0_file_num_compaction_trigger, and partitioned_index_filters were adjusted to lower read amplification for columnar workloads.
Production Deployment and Results
Migration Procedure
Real‑time data synchronization was built on an internal CDC pipeline, achieving sub‑10 ms end‑to‑end latency. Migration to the columnar engine required only a change of the DB connection IP, minimizing impact on client applications.
Stability Testing
A seven‑day production‑like workload was run, exercising both write and read paths under peak traffic. No data loss or latency spikes were observed.
Performance Gains
In a payment‑alert system with a 100 billion‑row fact table, twelve core analytical queries saw 10‑50× speedup after enabling columnar storage, while maintaining 100 % availability over nine months of continuous operation.
Deployment Topology
Three‑replica Raft clusters are deployed across two data centers. Two replicas reside in the primary site, one in the secondary site. Writes are directed to the primary site; reads are served locally from the nearest replica, eliminating cross‑center latency.
Future Directions
The team is investigating row‑column hybrid storage at the replica level, aiming to reduce the required replica count by up to 50 % while preserving full read‑write capability. This research aligns with industry efforts such as TiDB/TiFlash, HTAP trends, AI‑database integration, cloud‑native deployment, and hardware acceleration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
