Replacing Elasticsearch with Apache Doris for Real‑Time Big Data Analytics: Architecture, Performance, and Enterprise Cases
This article analyzes why Elasticsearch struggles with large‑scale, complex real‑time analytics and demonstrates how Apache Doris’s MPP, columnar storage, and native SQL support provide a cost‑effective, high‑performance alternative, illustrated with detailed enterprise case studies.
In the field of real‑time big data analysis, Elasticsearch has long been favored for its powerful full‑text search capabilities, but its limitations in complex aggregations, storage cost, and SQL ecosystem compatibility become evident as data volumes explode.
1. Core Pain Points of Elasticsearch
Elasticsearch excels at keyword search and log retrieval using Lucene inverted indexes, yet it lacks multi‑table JOIN, sub‑queries, window functions, and suffers from poor performance on multi‑dimensional GROUP BY aggregations (e.g., 10 seconds for 1 billion rows versus Doris under 1 second).
Storage cost is high due to row storage plus inverted indexes and doc values, with compression ratios around 1.5:1; a Tencent Music workload reduced from 697 GB to 195 GB after migrating to Doris (≈72 % savings).
The SQL ecosystem is fragmented: Elasticsearch relies on JSON DSL, making integration with BI tools (Tableau, Grafana) cumbersome.
2. Doris’s Architectural Advantages
Doris is designed as a real‑time analytical database, featuring a shared‑nothing MPP architecture with a Frontend for metadata and query scheduling and Backend nodes for parallel computation, achieving up to 550 MB/s write throughput (≈5× Elasticsearch).
Columnar storage with ZSTD compression (5‑10:1) and dynamic schema changes suit rapidly evolving log fields.
Native SQL support (MySQL protocol) enables standard SQL, complex JOINs, sub‑queries, and UDFs, dramatically lowering the learning curve for analysts.
3. Technical Solution Comparison
Data Model & Index Design : Doris 2.0 adds inverted indexes for text fields (e.g., USING INVERTED PROPERTIES("parser"="chinese")) enabling precise Chinese phrase matching.
Write Modes : Supports push (HTTP/JDBC) and pull (Kafka Routine Load) with built‑in transaction guarantees; write performance is 3‑5× higher than Elasticsearch (e.g., 360 logs reduced from 8 h to 2 h).
Query Optimization : CBO generates execution plans, leveraging Colocation Join and Runtime Filter to cut query latency by ~60 %.
Separation of Compute and Storage : Doris 3.0 decouples compute and storage, allowing hot data on SSD and cold data on HDD/object storage, improving resource utilization by 40 %.
4. Enterprise Case Studies
Case 1 – Tencent Music : Unified search and analytics reduced storage from 697 GB to 195 GB, improved write throughput 4×, and cut complex tag‑selection query time from minutes to seconds.
Case 2 – 360 Security Browser : Migrating billions of log rows to Doris lowered storage by 60 %, accelerated multi‑table JOINs 10×, and reduced development time by 30 %.
Case 3 – Observability Cloud : Combining metrics and logs in Doris halved CPU usage, reduced high‑frequency aggregation latency from 2 s to 300 ms, and cut cluster nodes by 60 %.
5. Key Technical Differences & Implementation Guide
Scenario Evaluation : Prioritize Doris for complex analytical workloads (log aggregation, user profiling) while retaining Elasticsearch for simple keyword search.
Data Migration : Use Doris External Catalog to gradually sync from Elasticsearch, then fully migrate once performance is validated.
Index Design : Apply inverted indexes for text, BKD indexes for numeric ranges, and avoid unnecessary indexes to save space.
Resource Isolation : Leverage Doris Resource Groups and Workload Groups to protect critical queries; Elasticsearch’s shard‑based isolation is weaker.
6. Benefits Summary & Migration Path
Performance : Complex query latency improves 3‑10×; write throughput rises 3‑5×, meeting sub‑second monitoring needs.
Cost : Storage savings of 60‑80 % and compute reduction of 40‑50 % for TB‑scale data.
Operations : Unified SQL stack simplifies maintenance, cuts troubleshooting time by >50 %.
Recommended migration steps: start with high‑frequency, complex analytics, run a hybrid Doris‑Elasticsearch architecture, then decommission Elasticsearch once stability is confirmed.
7. Future Trends
With Doris 3.0’s mature compute‑storage separation and upcoming 4.0 vector search, its role in real‑time analytics, lake‑warehouse convergence, and AIGC data support will expand, making it the preferred choice when SQL compatibility and low‑cost storage are required.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
