Big Data 22 min read

Evolution of OLAP: Key Technologies, Engine Comparison, and Future Trends

This article provides a comprehensive overview of OLAP technology evolution, covering its origins, modern requirements for massive and real‑time data, detailed comparisons of major open‑source OLAP engines such as Druid, Elasticsearch, Kylin, Doris/StarRocks, and ClickHouse, core architectural and storage techniques, and emerging trends like federated queries, hybrid storage, and lakehouse integration.

DataFunTalk

Dec 19, 2022

Evolution of OLAP: Key Technologies, Engine Comparison, and Future Trends

1. OLAP Background

Online Analytical Processing (OLAP) was first proposed by E.F. Codd in 1993 to address the limitations of traditional relational databases for analytical workloads, emphasizing multidimensional data models and fast, consistent, interactive queries for decision support.

2. New Requirements: Massive, Real‑Time, Evolving

With the rise of e‑commerce and mobile internet, data volumes have exploded and analysis demands have become more fine‑grained, requiring near‑real‑time freshness, support for schema evolution, and handling of high‑dimensional data without dimensional explosion.

Early reporting systems stored aggregated results in MySQL, which suffered from limited storage and offline processing. KV stores like HBase increased dimensionality but still faced explosion issues. Modern solutions such as Druid and Elasticsearch introduced incremental ingestion, real‑time aggregation, and indexing to achieve sub‑second latency.

New OLAP architectures now combine columnar storage, MVCC, and materialized‑view‑based vectorized MPP engines, exemplified by ClickHouse and Apache Doris, to meet massive data and real‑time analysis needs while supporting schema evolution.

3. Hadoop & Database Ecosystem

The Hadoop ecosystem (HDFS, Hive, Spark, Flink, HBase, ZooKeeper, Kafka, YARN) provides the foundational data pipelines for OLAP, while HTAP concepts (e.g., OceanBase, TiDB) aim to blend OLTP write performance with OLAP query efficiency.

4. OLAP Engine Landscape

Real‑time Druid : First open‑source massive‑scale OLAP engine, strong on real‑time queries but lacks full SQL support and update capabilities.

Elasticsearch : Built on Lucene, excels in full‑text search and high‑frequency writes; suitable for log analytics and real‑time dashboards, though SQL support is limited.

Kylin : Hadoop‑based distributed analytical warehouse offering sub‑second queries on petabyte‑scale data via pre‑aggregation; limited in detail queries.

Doris / StarRocks : Apache Doris provides MPP, columnar storage, MVCC, and strong consistency; StarRocks adds query optimizations and lakehouse support, both low‑entry‑barrier for billion‑row analytics.

ClickHouse : Yandex’s columnar MPP engine with extreme vectorized performance, rich table engines, and extensive indexing; widely adopted in large‑scale internet companies.

5. Core OLAP Technologies

Architecture : Distributed multi‑replica design with consensus protocols (Raft/ZAB) ensures high availability and consistent metadata management.

Storage : MVCC guarantees atomic writes and strong consistency; columnar storage reduces I/O for read‑heavy workloads; materialized views provide pre‑aggregation for faster queries; various index types (primary, skipping, bitmap, Bloom filter) accelerate data access.

Computation : Query processing follows parsing → plan generation → distributed execution; optimizers apply rule‑based and cost‑based techniques, supporting diverse join strategies and vectorized execution models.

6. Future Trends

1) Federated Queries : OLAP engines query heterogeneous data sources (MySQL, Hive, Elasticsearch) to provide unified, high‑performance access.

2) Hybrid Storage : Embedding KV or search engines within OLAP systems for seamless integration of row‑store and column‑store capabilities.

3) Lakehouse Integration : Combining data lake openness with data‑warehouse performance to support BI and AI workloads, offering low‑latency analytics and point‑queries.

Cloud‑native elasticity and multi‑tenant scalability will further drive OLAP adoption in dynamic, high‑traffic environments.

7. References

Hadoop ecosystem, Druid, Kylin, Elasticsearch, Doris, Impala, AWS lakehouse, columnar storage design, vectorized execution, and other technical papers are listed for deeper study.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ClickHouse OLAP Databases Druid Doris

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.