Unlock Ultra‑High Compression with HiStore’s Knowledge‑Grid Columnar Database
HiStore, Alibaba’s columnar database built on a patented Knowledge‑Grid, delivers ultra‑high compression (over 10:1, up to 40:1), low‑cost storage, rapid query performance, linear scalability, and seamless MySQL compatibility, making it ideal for massive OLAP workloads and real‑time analytics across diverse industries.
HiStore is a columnar database developed by Alibaba's middleware team, built on a patented Knowledge‑Grid technology that enables high‑compression storage for massive data sets.
Advantages
Supports TB‑scale data with >10:1 average compression, up to 40:1 in some cases.
Columnar storage eliminates the need for indexes or partitions, using Knowledge‑Grid nodes to store block statistics and accelerate queries.
Query performance can be up to 30× faster than MyISAM or InnoDB for billion‑row SELECTs.
Parallel data import via MySQL protocol and dedicated loading tools.
High concurrency for real‑time multidimensional retrieval and second‑level data import.
Linear scalability when combined with TDDL/DRDS.
Low migration cost; MySQL tools work seamlessly.
Efficient handling of complex aggregation queries (SUM, COUNT, AVG, GROUP BY).
Value
Reduces design effort—no star or snowflake schemas, materialized views, partitions, or indexes required.
Saves storage space thanks to high compression ratios.
Broad compatibility with BI tools such as Pentaho, Cognos, and JasperReports.
Low operational overhead; performance remains stable as data grows.
Supports sharding and horizontal scaling via TDDL/DRDS.
Typical Use Cases
Log and event management systems.
Telecom call‑detail record analysis.
Large‑scale web, mobile, and marketing analytics.
Data warehouses and data marts for real‑time reporting.
Cost‑sensitive scenarios requiring real‑time queries.
IoT data collection and later statistical processing.
Historical evaluation and order data.
Architecture Overview
Knowledge Grid – Core Performance Driver
String Query Process
Rough Set Classification
During query execution, the Knowledge Grid classifies data nodes (DN) into three categories:
Relevant DN – nodes that satisfy the query conditions.
Irrelevant DN – nodes that do not satisfy the conditions.
Suspicious DN – nodes where only part of the data meets the conditions.
Example: Product Review Management (Timed Retrieval)
SELECT COUNT(feed_id)
FROM feed_item_subscribe
WHERE seller_id = 12345
AND (gmt_modify BETWEEN str_to_date('start','%Y%m%d%H')
AND str_to_date('end','%Y%m%d%H')) -- [start, end)
LIMIT start+1, num; -- paginationColumns involved:
seller_id (bigint) – seller identifier.
feed_id (bigint) – primary review ID.
feedback (varchar(4000)) – review content.
gmt_modify (datetime) – modification timestamp.
Additional Query Example
SELECT COUNT(*)
FROM employees
WHERE salary > 100000
AND age < 35
AND job = 'it'
AND city = 'hangzhou';Processing steps:
1. Locate data packets containing salary > 100000
2. Locate packets with age < 35
3. Locate packets where job = 'it'
4. Locate packets where city = 'hangzhou'
5. Discard packets unrelated to the conditions
6. Decompress relevant data within the remaining packets
7. Execute the retrievalComparable Products
Infobright
InfiniDB
Pivotal Greenplum
Amazon RedShift
Teradata DB
HP Vertica
SAP HANA
IBM Netezza
kstore (Shenzhou General)
Huawei GaussDB
DM7 (Dameng Database)
Compression Ratios by Data Type
Compression ratios depend not only on data type but also on data variance. For example, a column with only three possible values (0, 1, -1) cannot achieve high compression. Fields using comment lookup behave like bitmap indexes and compress better. Date fields often achieve the highest ratios, while varchar fields compress poorly; therefore, avoid varchar when possible and consider converting IP to bigint or splitting dates into year/month/day columns.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
