Databases 17 min read

HiStore Columnar Database Overview and Architecture

HiStore, an Alibaba middleware team's columnar database, offers high‑compression, low‑cost OLAP storage for massive datasets, leveraging a knowledge‑grid engine, column‑based storage, and efficient compression to achieve multi‑dimensional ad‑hoc queries with performance gains of up to tens of times over traditional row‑based systems.

Architect

Jun 30, 2016

HiStore is a columnar database developed by Alibaba's middleware team, designed for massive data OLAP workloads with high compression ratios and low storage and maintenance costs.

Key features include support for ad‑hoc multi‑dimensional queries, compatibility with MySQL protocol, batch data load, concurrent queries, and data block replication, delivering performance several to tens of times faster than traditional row‑based engines.

Technical architecture:

Engine based on a knowledge grid (KG) and SMP optimization, storing data in columnar format.

Physical storage consists of fixed‑size Data Nodes (DN) organized into blocks, each compressed efficiently.

Knowledge Grid comprises Metadata Nodes (MD) and Knowledge Nodes (KN) that store aggregation statistics, value ranges, and other metadata to enable approximate queries and reduce I/O.

Columnar storage reduces I/O by reading only required columns, while the KG optimizer selects relevant DNs, avoiding full scans and index maintenance.

Data compression leverages column type‑specific algorithms (e.g., PPM for strings, range‑based encoding for numerics) achieving average compression ratios >10:1 and up to 40:1.

Import pipeline supports external preprocessing, allowing high‑throughput data ingestion (up to 2 TB/hour) without SQL parsing overhead.

Future roadmap includes a management platform for automated deployment and monitoring, high‑availability clustering with compressed block replication, hybrid row/column engine, and extended data source connectors.

Example SQL snippet illustrating row‑based I/O cost versus columnar efficiency: select sum(score) from table; In a row‑based store, this query would read ~1 GB of I/O for 1 M rows of 1 KB each, whereas HiStore reads only the relevant column data (~8 MB).

Images illustrating architecture and components:

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OLAP data compression HiStore knowledge grid

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.