Databases 8 min read

Unlock Ultra‑High Compression with HiStore’s Knowledge‑Grid Columnar Database

HiStore, Alibaba’s columnar database built on a patented Knowledge‑Grid, delivers ultra‑high compression (over 10:1, up to 40:1), low‑cost storage, rapid query performance, linear scalability, and seamless MySQL compatibility, making it ideal for massive OLAP workloads and real‑time analytics across diverse industries.

21CTO

Jun 18, 2016

Unlock Ultra‑High Compression with HiStore’s Knowledge‑Grid Columnar Database

HiStore is a columnar database developed by Alibaba's middleware team, built on a patented Knowledge‑Grid technology that enables high‑compression storage for massive data sets.

Advantages

Supports TB‑scale data with >10:1 average compression, up to 40:1 in some cases.

Columnar storage eliminates the need for indexes or partitions, using Knowledge‑Grid nodes to store block statistics and accelerate queries.

Query performance can be up to 30× faster than MyISAM or InnoDB for billion‑row SELECTs.

Parallel data import via MySQL protocol and dedicated loading tools.

High concurrency for real‑time multidimensional retrieval and second‑level data import.

Linear scalability when combined with TDDL/DRDS.

Low migration cost; MySQL tools work seamlessly.

Efficient handling of complex aggregation queries (SUM, COUNT, AVG, GROUP BY).

Value

Reduces design effort—no star or snowflake schemas, materialized views, partitions, or indexes required.

Saves storage space thanks to high compression ratios.

Broad compatibility with BI tools such as Pentaho, Cognos, and JasperReports.

Low operational overhead; performance remains stable as data grows.

Supports sharding and horizontal scaling via TDDL/DRDS.

Typical Use Cases

Log and event management systems.

Telecom call‑detail record analysis.

Large‑scale web, mobile, and marketing analytics.

Data warehouses and data marts for real‑time reporting.

Cost‑sensitive scenarios requiring real‑time queries.

IoT data collection and later statistical processing.

Historical evaluation and order data.

Architecture Overview

Knowledge Grid – Core Performance Driver

String Query Process

Rough Set Classification

During query execution, the Knowledge Grid classifies data nodes (DN) into three categories:

Relevant DN – nodes that satisfy the query conditions.

Irrelevant DN – nodes that do not satisfy the conditions.

Suspicious DN – nodes where only part of the data meets the conditions.

Example: Product Review Management (Timed Retrieval)

SELECT COUNT(feed_id)
FROM feed_item_subscribe
WHERE seller_id = 12345
  AND (gmt_modify BETWEEN str_to_date('start','%Y%m%d%H')
                     AND str_to_date('end','%Y%m%d%H')) -- [start, end)
LIMIT start+1, num; -- pagination

Columns involved:

seller_id (bigint) – seller identifier.

feed_id (bigint) – primary review ID.

feedback (varchar(4000)) – review content.

gmt_modify (datetime) – modification timestamp.

Additional Query Example

SELECT COUNT(*)
FROM employees
WHERE salary > 100000
  AND age < 35
  AND job = 'it'
  AND city = 'hangzhou';

Processing steps:

1. Locate data packets containing salary > 100000
2. Locate packets with age < 35
3. Locate packets where job = 'it'
4. Locate packets where city = 'hangzhou'
5. Discard packets unrelated to the conditions
6. Decompress relevant data within the remaining packets
7. Execute the retrieval

Comparable Products

Infobright

InfiniDB

Pivotal Greenplum

Amazon RedShift

Teradata DB

HP Vertica

SAP HANA

IBM Netezza

kstore (Shenzhou General)

Huawei GaussDB

DM7 (Dameng Database)

Compression Ratios by Data Type

Compression ratios depend not only on data type but also on data variance. For example, a column with only three possible values (0, 1, -1) cannot achieve high compression. Fields using comment lookup behave like bitmap indexes and compress better. Date fields often achieve the highest ratios, while varchar fields compress poorly; therefore, avoid varchar when possible and consider converting IP to bigint or splitting dates into year/month/day columns.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data OLAP data compression Columnar Database knowledge grid

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.