Databases 32 min read

How PolarDB MySQL’s In‑Memory Column Index Supercharges HTAP Performance

This article explains how PolarDB MySQL introduces an In‑Memory Column Index (IMCI) to combine row‑store OLTP strength with column‑store analytical speed, detailing the architectural innovations, optimizer decisions, data organization, resource isolation strategies, and benchmark results that show IMCI achieving tens to hundreds of times faster query execution than native MySQL and matching ClickHouse performance.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How PolarDB MySQL’s In‑Memory Column Index Supercharges HTAP Performance

Analytical databases are booming in both capital markets and the tech community, and cloud‑native advances make rebuilding analytical database stacks on the cloud a huge market opportunity.

MySQL ecosystem HTAP solutions

MySQL is primarily an OLTP system; its open‑source community focuses on transaction throughput, leaving analytical capabilities lagging. Users increasingly need real‑time analytics, prompting three HTAP approaches.

1. Separate TP and AP systems

Deploy two databases (one for OLTP, one for OLAP) and synchronize data in real time, but this introduces synchronization delay and operational complexity.

2. Divergent Design with multi‑replica

Use a replica that stores data in columnar format (e.g., TiDB/TiFlash) for analytical workloads, requiring migration to a NewSQL system and incurring compatibility challenges.

3. Integrated row‑column hybrid storage

Commercial DBMS such as Oracle, SQL Server, and DB2 adopt a row‑column hybrid storage model; PolarDB follows this proven approach.

Evolution of PolarDB MySQL AP capabilities

PolarDB provides up to 100 TB per instance and a one‑write‑many‑read architecture, allowing read‑only (RO) nodes to run complex analytical queries without affecting TP load.

Why MySQL struggles in AP scenarios

The Volcano iterator model relies on deep function nesting and virtual calls, hurting CPU pipeline and cache efficiency.

Execution is largely single‑threaded, limiting multi‑core utilization.

Row‑store format forces reading unnecessary columns, causing I/O waste and memory bandwidth pressure.

Parallel query breakthrough

PolarDB’s Parallel Query automatically launches parallel execution when data volume exceeds a threshold, partitioning data across threads, aggregating results, and achieving exponential latency reduction.

Why column‑store is needed

Columnar layout improves I/O efficiency through compression, data skipping, and column pruning.

Columns stored contiguously enhance CPU cache friendliness and enable SIMD vectorization for per‑core speedup.

PolarDB In‑Memory Column Index (IMCI)

IMCI adds columnar storage and in‑memory computation to PolarDB, allowing a single database to serve both TP and AP workloads while preserving PolarDB’s strong OLTP performance.

Key technical innovations

Support for columnar indexes in InnoDB as secondary indexes; indexes can reside fully in memory or be persisted to shared storage.

A column‑oriented execution engine processes data in 4 KB batches, leverages SIMD, and parallelizes key operators, delivering multi‑order‑magnitude speedup over MySQL’s row engine.

A cost‑based optimizer evaluates row, parallel, and column plans, selecting the lowest‑cost execution path.

RO nodes can host column indexes for analytics, using all available CPU and memory without impacting TP resources.

Hybrid optimizer

The optimizer uses a whitelist and cost model to decide whether a query runs on the row engine, Parallel Query, or IMCI, ensuring compatibility and performance.

Column index as secondary index

Implemented as a secondary index, IMCI reuses InnoDB’s transaction handling, redo log, and replication mechanisms, allowing seamless fallback to row storage if needed.

Data organization

Data is packed into RowGroups and DataPacks; writes are append‑only, and background compaction reclaims space. Statistics (min, max, sum, null count, row count) are stored per DataPack to enable coarse‑grained pruning.

TP/AP resource isolation

Three deployment patterns are supported: (1) mixed storage on the RW node for light AP workloads, (2) a dedicated AP RO node with isolated CPU and memory, and (3) a standby node with separate shared storage, achieving CPU, memory, and I/O isolation.

Performance evaluation

In a TPC‑H benchmark (100 GB, 22 queries) IMCI outperforms native MySQL by tens to hundreds of times (up to ~400× on Q6) and matches ClickHouse’s performance, demonstrating its competitiveness as an analytical engine.

Future work

Automated index recommendation based on query patterns.

Standalone column tables and OSS storage to further reduce costs.

Fine‑grained row‑column hybrid execution where parts of a plan run on row storage and parts on column storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlHTAPDatabase OptimizationPolardbColumn StoreIn-Memory Column Index
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.