Databases 13 min read

Apache Doris: A High‑Performance Real‑Time Analytical Database for Online High‑Concurrency Reporting

This article introduces Apache Doris, a real‑time analytical database built on an MPP architecture, explains its suitability for massive data workloads and online high‑concurrency reporting scenarios, and details the core technologies—storage models, vectorized query engine, materialized views, partitioning, indexing, row‑store and prepared statements—that enable sub‑second query latency and high QPS, while also showing a real‑world case study and how to join the Doris community.

DataFunSummit
DataFunSummit
DataFunSummit
Apache Doris: A High‑Performance Real‑Time Analytical Database for Online High‑Concurrency Reporting

Apache Doris is an open‑source, MPP‑based real‑time analytical database renowned for its ultra‑fast query response and ease of use, delivering sub‑second latency even on massive data volumes and supporting both high‑concurrency point queries and high‑throughput complex analytics.

The platform serves a variety of modern data‑warehouse scenarios, including unified lake‑house querying, online high‑concurrency reporting (replacing MySQL, HBase, etc.), user‑profile and behavior analysis (as an alternative to Elasticsearch or Spark), and log storage/analysis (comparable to Elasticsearch or Loki).

Typical online high‑concurrency reporting use cases highlighted are advertising marketing dashboards, logistics operation dashboards, insurance agent customer analysis, and transaction detail queries, all of which demand low‑latency data ingestion, sub‑second query response, and robust high‑availability.

Key technical innovations of Doris include:

Three storage models: aggregation model for pre‑aggregation, unique‑key model for row‑level updates (supporting merge‑on‑write/read), and detail model for append‑only data.

A fully vectorized query engine that reduces virtual‑function calls, improves cache locality, and leverages SIMD, achieving 5‑10× speedups on wide‑table aggregations.

Strongly consistent materialized views that provide pre‑aggregated results and enable automatic query optimization.

A two‑level partitioning scheme (time‑range + hash buckets) that balances data distribution and boosts concurrent query performance.

Rich indexing options: prefix indexes, inverted indexes (future support for full‑text and vector search), and sparse bloom‑filter indexes for efficient equality and LIKE queries.

Row‑store encoding for point‑lookup workloads, reducing I/O amplification compared to pure columnar storage.

Server‑side prepared statements that cache compiled SQL and execution plans on the Frontend, dramatically lowering latency for repetitive point‑lookup queries.

Performance benchmarks using YCSB show that after applying these optimizations, Doris achieves average primary‑key query latency of 0.3 ms (p99 < 1 ms) and QPS in the tens of thousands, outperforming comparable OLAP systems by up to two‑fold for point queries and orders of magnitude for non‑key queries.

A real‑world case study from a 618 promotion demonstrates Doris handling 300 billion new rows per day, peak ingestion of ~1 M rows/second, 8 × 10⁷ daily queries with 150 ms 99th‑percentile latency, and sustained 4 500+ QPS without service interruption.

The article concludes with guidance on joining the Apache Doris community, including downloading the 2.0 Alpha/Beta releases, subscribing to the developer mailing list, attending bi‑weekly developer meetings, and accessing design documents via DSIP.

performance optimizationreal-time analyticsData Warehousehigh concurrencyCommunityMaterialized ViewsApache Doris
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.