Databases 12 min read

How BaikalDB’s Columnar Storage Boosted Real‑Time Analytics at DTCC2020

This article details how the DTCC2020 guest speaker from Tongcheng‑Elong introduced BaikalDB’s distributed columnar storage, covering internal and external motivations, technology comparison, architecture, implementation tricks, performance gains in production, and future hybrid row‑column research directions.

ITPUB
ITPUB
ITPUB
How BaikalDB’s Columnar Storage Boosted Real‑Time Analytics at DTCC2020

Background

Rapid growth of business workloads required a native distributed database that could scale horizontally, provide real‑time OLAP on wide tables, simplify operations (online scaling, dynamic column and index addition), support active‑active dual‑center deployment, and expose cloud‑native characteristics while offering both OLTP and OLAP capabilities to avoid data duplication.

Industry trends toward NewSQL, distributed transaction protocols, and the convergence of AI, big data, and cloud‑native architectures further motivated the exploration of a columnar storage engine.

Technology Evaluation and Selection

Core Comparison

Several NewSQL and column‑oriented databases were benchmarked for latency, scalability, and feature completeness. The comparison highlighted trade‑offs between row‑store HTAP systems, pure columnar stores, and hybrid designs.

Chosen Platform

BaikalDB, an open‑source NewSQL database, satisfied most functional requirements. The missing piece—native columnar storage—was developed in‑house and contributed back to the community.

Architecture Overview

Basic Architecture

Architecture diagram
Architecture diagram

Core Features

Feature diagram
Feature diagram

Storage Model

Storage model diagram
Storage model diagram

Columnar Engine Implementation

Advantages of Columnar Layout

Reduced I/O – only the columns required by a query are read or written.

Higher compression – homogeneous column values enable dictionary‑based compression with ratios far exceeding row‑store compression.

Cache‑friendly – lower I/O improves CPU cache utilization.

Vectorized execution – processing data in blocks (vectors) improves cache locality and instruction‑level parallelism.

Late materialization – columns are materialized only after filters have been applied, avoiding unnecessary scans.

Design Changes

The KV layer’s encoding was rewritten so that values are stored column‑wise instead of row‑wise. This required:

Modifying the on‑disk layout to interleave column values while preserving the original transaction semantics.

Extending the transaction, scan, filter, and compression modules to operate on column vectors.

KV encoding diagram
KV encoding diagram

Related Optimizations

I/O reduction / late materialization : column‑wise reads avoid full‑row fetches; updates affect only the modified columns.

ZSTD compression : a dictionary‑based ZSTD codec was integrated, providing up to 3‑5× better compression on columnar data.

Parallelism : BaikalDB region size and partition count were tuned to increase the number of concurrent columnar scan workers.

Vectorized query engine : the traditional volcano iterator was replaced by a vector engine that processes batches of rows in a single call, reducing function‑call overhead.

RocksDB read tuning : parameters such as write_buffer_number, level0_file_num_compaction_trigger, and partitioned_index_filters were adjusted to lower read amplification for columnar workloads.

Production Deployment and Results

Migration Procedure

Real‑time data synchronization was built on an internal CDC pipeline, achieving sub‑10 ms end‑to‑end latency. Migration to the columnar engine required only a change of the DB connection IP, minimizing impact on client applications.

Stability Testing

A seven‑day production‑like workload was run, exercising both write and read paths under peak traffic. No data loss or latency spikes were observed.

Performance Gains

In a payment‑alert system with a 100 billion‑row fact table, twelve core analytical queries saw 10‑50× speedup after enabling columnar storage, while maintaining 100 % availability over nine months of continuous operation.

Deployment Topology

Three‑replica Raft clusters are deployed across two data centers. Two replicas reside in the primary site, one in the secondary site. Writes are directed to the primary site; reads are served locally from the nearest replica, eliminating cross‑center latency.

Future Directions

The team is investigating row‑column hybrid storage at the replica level, aiming to reduce the required replica count by up to 50 % while preserving full read‑write capability. This research aligns with industry efforts such as TiDB/TiFlash, HTAP trends, AI‑database integration, cloud‑native deployment, and hardware acceleration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsPerformance OptimizationHTAPColumnar StorageBaikalDB
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.