How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics
Based on a DTCC2022 presentation, this article explains Apache Doris's high‑performance MPP architecture, its cloud‑native extensions in SelectDB, and how they solve large‑scale log storage and analysis with superior write throughput, storage efficiency, and interactive query speed.
Apache Doris Overview
Apache Doris is a high‑performance, real‑time analytical database built on an MPP architecture, offering sub‑second query responses on massive data sets. Since graduating from the Apache incubator in June 2022, it has attracted over 400 contributors and is used in more than 1,000 production environments worldwide, including major Chinese internet companies such as Baidu, Meituan, Xiaomi, JD, ByteDance, Tencent, and many traditional industries.
Doris supports both high‑concurrency point queries and high‑throughput complex analytics, and its MySQL‑compatible protocol enables seamless integration with existing tools.
Typical Log Storage and Analysis Scenario
Log data requires massive write throughput, low storage cost, and fast interactive queries with full‑text search and time‑based sorting. Traditional solutions fall into two categories: inverted‑index systems like Elasticsearch (ES) and metadata‑index or no‑index architectures like Loki. ES provides fast query performance but limited write throughput and higher storage cost, while Loki offers higher write throughput and lower storage cost but slower queries.
Log Scenario Solution
SelectDB, the commercial cloud‑native version of Apache Doris, builds a log‑analysis solution by leveraging Doris's high‑performance vectorized engine, SelectDB's compute‑storage separation, lightweight inverted index, and time‑series management.
Data ingestion uses Logstash’s http output plugin to write logs into SelectDB. Downstream, Grafana with MySQL data source and Superset provide visual dashboards, while BI tools can query via the MySQL protocol.
Performance tests on a 3‑node 16‑core, 64 GB cluster using the ES official benchmark dataset (32 GB, 2.47 billion rows) show that SelectDB achieves 4.2× higher write speed than ES, uses only one‑fifth of ES’s storage space, and delivers 2× faster query performance.
Key Technology Breakdown
1. MPP Query and Vectorized Engine – Columnar memory layout and SIMD instructions improve cache hit rates, delivering 5‑10× speedups in wide‑table aggregations.
2. Multi‑Operator Optimization & Optimizer – Adaptive two‑stage aggregation, runtime filter push‑down, and the Nereids optimizer (supporting CBO and RBO) enhance join reorder and predicate push‑down, yielding 2‑10× performance gains.
3. Lightweight Inverted Index – Integrated index supports fast text, numeric, and date searches with bitmap‑based structures, and columnar storage with ZSTD compression achieves >5× higher compression than gzip.
4. Compute‑Storage Separation Cloud‑Native Architecture – Object storage for data, shared cache for writes, elastic scaling, and workload isolation reduce storage cost to one‑fifth and unit cost to one‑third.
5. High‑Throughput Real‑Time Writes – Client‑side micro‑batch writes combined with server‑side compaction achieve GB/s write throughput with low write amplification.
6. Fast Interactive Queries – Partition‑pruned time‑range scans, inverted‑index full‑text search, and TopN algorithms enable billion‑log queries with sub‑second response times.
Open Source Contribution
The discussed technologies—lightweight inverted index, TopN optimization, and time‑series compaction—have been contributed back to the Apache Doris community and are slated for release in Doris 2.0 (Q1 2023). A preview of Doris 2.0 will be available in February for community testing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
