Databases 17 min read

MariaDB ColumnStore: Future‑Ready OLAP Architecture, Limits & Migration Tips

This article shares practical experience deploying MariaDB ColumnStore for offline OLAP, covering its MPP‑based architecture, user‑experience benefits, high‑availability options, numerous syntax and data‑type limitations, and detailed scripts for schema conversion and incremental data extraction using pt‑archiver.

dbaplus Community

Dec 12, 2017

MariaDB ColumnStore: Future‑Ready OLAP Architecture, Limits & Migration Tips

Background and Motivation

This article documents production‑grade usage of MariaDB ColumnStore for offline OLAP analysis, focusing on the reasons for selecting a Massively Parallel Processing (MPP) architecture over MapReduce‑style solutions.

OLAP Technology Streams

MapReduce – batch‑oriented processing exemplified by the Hadoop ecosystem.

MPP – column‑store systems such as Infobright, MariaDB ColumnStore (derived from InfiniDB), and Greenplum that store data column‑wise and execute queries in parallel across many nodes.

Why Choose MPP / ColumnStore

Supports the standard MySQL protocol, allowing existing MySQL clients (e.g., Navicat, SQLyog, WebSQL) to connect without learning a new language.

DBAs do not need to create indexes or rewrite complex SQL; the engine handles joins, aggregations, stored procedures, and UDFs automatically.

Optimized for large‑scale data warehousing and fast analytical queries while remaining easy to maintain.

Architecture Overview

UM (User Module) – receives client connections via the MySQL protocol, forwards SQL requests to PM modules, aggregates results, and returns the final result set.

PM (Performance Module) – stores data in a distributed columnar format, performs parallel scans, hash joins, and aggregations, and extracts data into memory for computation.

UM Processes

mysqld : connection authentication, SQL parsing, and plan generation.

ExeMgr : converts the query tree into a job list (filter, join, aggregation, sort).

DMLProc / DDLProc : dispatches DML/DDL to the appropriate PM; cpimport distributes source files to PM nodes.

PM Processes

PrimProc : executes queries, retrieves data blocks, and returns results to UM.

WriteEngineServer : coordinates DML, DDL, and bulk‑load operations on each PM.

ProcMgr : starts, monitors, and restarts all ColumnStore processes.

ProcMon : heartbeat monitoring between nodes.

High‑Availability Modes

External SAN fiber storage – DBRoot is mounted on a shared SAN; if a PM fails, the DBRoot can be remounted on a healthy node for automatic failover.

Stand‑by master‑slave – two identical MPP clusters ingest data in parallel; if one cluster fails, the other continues serving queries.

Observed Limitations (MariaDB 1.0 series, partially resolved in 1.1)

Unsupported CHARACTER SET and COLLATE clauses.

TEXT/BLOB types not supported before 1.1. timestamp columns cannot use DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP.

Maximum DECIMAL precision is 18. ROW_FORMAT=COMPACT not supported.

Maximum VARCHAR length is 8000. BIT type not supported.

Comments cannot contain single quotes.

Row size limit of 65535 bytes (UTF‑8 characters count as three bytes).

Reserved keywords (e.g., user, comment, match, key, update, status) cannot be used as identifiers.

Workarounds and Migration Scripts

A convert_schema.sh script automates schema conversion:

Export original table definitions.

Transform them by removing unsupported clauses and adjusting data types.

Apply the transformed DDL to ColumnStore.

# bash convert_schema.sh source_schema.sql columnstore_schema.sql

Data extraction was performed with a modified Percona pt‑archiver workflow:

Full load uses SELECT * FROM table WHERE id>=1 to dump all rows.

Incremental load relies on an update_time column defined as

timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

to fetch rows newer than the last extracted timestamp.

Because ColumnStore does not support REPLACE INTO or LOAD DATA … REPLACE, duplicate rows must be de‑duplicated after import.

Key modifications to the original pt‑archiver script:

Rename DELETE to DELETE111 to avoid accidental full deletions.

Append a suffix (e.g., desc1) to columns that clash with reserved keywords.

Change the field delimiter from a comma to an ampersand ( &) to avoid conflicts with data containing commas.

Add FIELDS TERMINATED BY for bulk LOAD DATA imports.

Remove FORCE INDEX and explicit ORDER BY so the optimizer can choose the best index (typically the update_time index for incremental extracts).

Performance Observations

DML operations (INSERT/UPDATE/DELETE) are slower in ColumnStore because the engine is optimized for bulk modifications rather than row‑level changes. Rewriting query plans to rely on the update_time index rather than forcing primary‑key scans yields faster incremental extracts.

References

ColumnStore naming conventions: https://mariadb.com/kb/en/library/columnstore-naming-conventions/

JIRA issues documenting limitations and bugs:

MCOL‑1022: https://jira.mariadb.org/browse/MCOL-1022

MCOL‑1036 (ORDER BY not in DISTINCT): https://jira.mariadb.org/browse/MCOL-1036

MCOL‑1056 (Memory overflow on large TEXT queries): https://jira.mariadb.org/browse/MCOL-1056

MCOL‑1080 (REPLACE INTO not supported): https://jira.mariadb.org/browse/MCOL-1080

Data ingestion guide: https://mariadb.com/kb/en/library/columnstore-data-ingestion/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration performance SQL OLAP MariaDB Columnstore

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.