Databases 14 min read

Overview and Architecture of the CSTORE Columnar Engine for MySQL 8.0

This document explains the differences between OLTP and OLAP workloads, introduces the CSTORE columnar storage engine architecture, its core technologies, performance advantages, typical use cases, benchmark results, and future development plans for MySQL 8.0.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Overview and Architecture of the CSTORE Columnar Engine for MySQL 8.0

Part 1 Overview

Databases have two typical business access scenarios: OLTP (On‑Line Transaction Processing) for online transaction processing and OLAP (On‑Line Analytical Processing) for online analytical processing.

1.1 OLTP

OLTP workloads involve many insert, update, delete, and select operations, with a high write‑to‑read ratio, many concurrent transactions, and strict response‑time requirements. Queries are usually point lookups or small‑range scans on a few rows, and data is stored row‑wise in fixed‑size pages.

1.2 OLAP

OLAP workloads are read‑heavy, with infrequent bulk writes and large‑scale queries that may scan whole tables, perform aggregations, joins, and ad‑hoc analysis. Because only a few columns are needed per query, row‑wise storage is inefficient; column‑wise storage reduces I/O by reading only required columns.

CDB for MySQL 8.0, released by Tencent TEG Cloud Architecture team, inherits all improvements of native MySQL 8.0 and adds new features, most notably the native InnoDB engine for OLTP and the new CSTORE columnar engine for OLAP.

This document describes the architecture, applicable scenarios, features, and performance metrics of the CSTORE columnar engine, with occasional comparisons to MyISAM and InnoDB for better understanding.

Part 2 CSTORE Architecture Features

CSTORE provides high‑speed data loading, high compression ratios, sparse indexes, and late‑materialization query optimizations as a columnar storage engine.

All data is stored column‑wise; each fixed‑size group of rows (a DataGroup) forms a logical page that is compressed before being written to disk. CSTORE maintains sparse indexes per column (max, min, sum, null count, row count, etc.) instead of traditional secondary indexes.

Because data modifications are assumed to be infrequent and mostly bulk, DELETE/UPDATE operations are slower than InnoDB, but bulk writes are heavily optimized to exploit multi‑core CPUs.

The query engine inherits MySQL’s query capabilities and adds columnar‑specific optimizations, allowing most existing MySQL queries to run unchanged with performance gains.

Part 3 CSTORE Core Technologies

3.1 Fast Loading

CSTORE uses multithreading: each column’s fixed‑size row block (DataGroup) is treated as a task and loaded in parallel by worker threads.

3.2 Late Materialization

CSTORE postpones materialization of intermediate results, representing them as bitmaps to reduce memory consumption during large‑scale queries.

3.3 Asynchronous Replication as a Standby

To alleviate the growing data volume of InnoDB‑based services, CSTORE can act as a standby replica using MySQL’s binlog replication, a producer/consumer model, multithreading, and data merging, achieving near‑zero replication lag while providing high‑compression storage (average 1/9 of original size).

Part 4 Typical Advantages and Scenarios

4.1 Key Features and Benefits

Fast data loading – up to 5‑10× faster than InnoDB by fully utilizing multi‑core CPUs.

High compression – columnar format and custom delta encoding achieve up to 10× compression.

Accelerated aggregation – functions like MAX, MIN, SUM, AVG, COUNT are computed from metadata without scanning raw data.

Fast complex queries – selective column access and sparse indexes avoid full scans; late materialization reduces temporary table overhead.

Full MySQL compatibility – applications can use existing MySQL APIs without modification.

4.2 Typical Use Cases

Log or historical data analysis – large batches of data are loaded once and queried repeatedly.

Result sets from big‑data jobs – moderate‑size results stored in MySQL for ad‑hoc analysis.

Migration from MyISAM or InnoDB for analytical workloads – switching to CSTORE yields immediate performance and storage benefits.

Part 5 Performance Metrics

5.1 Data Loading Performance

Loading ~6 million rows of the TPC‑H LINEITEM table shows CSTORE loading speed about 9× faster than InnoDB and 5× faster than MyISAM on a 48‑core machine.

5.2 Compression Ratio

CSTORE reduces storage to roughly one‑seventh of the original data size, outperforming InnoDB and MyISAM even after applying compression algorithms.

5.3 Query Performance

Running the 22 TPC‑H benchmark queries on the loaded data shows most queries executing significantly faster with CSTORE compared to InnoDB.

Part 6 Future Plans

6.1 Data Type Support

Add BLOB and JSON types so that CSTORE matches InnoDB’s data‑type coverage.

6.2 Vectorized and Parallel Query Execution

Current single‑core query execution can be further accelerated by adopting vectorized and parallel processing, potentially achieving >10× speedups on 16‑core machines.

6.3 Distributed CSTORE Cluster (MPP)

To handle larger analytical workloads, CSTORE will evolve into a distributed MPP cluster similar to Greenplum, Impala, ClickHouse, using sharding, distributed query planning, and consensus protocols for high availability.

Conclusion

We now have a basic understanding of the CSTORE columnar engine. Future work will continue to improve CSTORE and release it for public use—stay tuned!

performancemysqlOLAPcolumnar storageCStoredatabase engine
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.