How Companies Deploy Apache Doris for Real‑Time Data Warehousing
This article summarizes how enterprises adopt Apache Doris for lakehouse integration, near‑real‑time data warehouses, federated queries, and performance optimization, highlighting practical experiences, connector support, scheduling capabilities, and comparisons with Elasticsearch.
Hello everyone, today we share a summary of Doris production practice in various companies.
Doris has evolved from pure OLAP to replace previous query engines and competes with ES, ClickHouse.
We summarize useful experiences from community shares in the first half of 2025.
Landing Scenarios Summary
From dozens of company shares, Doris usage has expanded from pure OLAP to near‑real‑time scheduling and lakehouse integration.
Lakehouse Integration
Since version 2.1, Apache Doris has significantly improved lakehouse capabilities, with many companies trying it.
Data is collected via Kafka, processed with Flink or Spark, and ingested into lake tables or Doris tables based on timeliness.
Extensive data connectors support
Doris supports many connectors such as Hive, Iceberg, Hudi, Paimon, and JDBC databases.
It also provides an extensible connector framework with standard Catalog, Database, Table hierarchy, allowing developers to map to data sources and implement access logic.
Data lake analysis engine
Doris integrates deeply with lake frameworks like Paimon and Iceberg, enabling direct access and accelerated queries on lake tables, and supports writing results back to Iceberg.
Analysis engine
Doris can act as a unified SQL query engine for federated analysis across different data sources.
By creating multiple catalogs, SQL can reference catalog.db.table to join data across sources.
Example of creating catalogs:
-- Create Hive Catalog
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.0.0.1:9083'
);
-- Create MySQL Catalog
CREATE CATALOG mysql PROPERTIES (
"type"="jdbc",
"user"="root",
"password"="pwd",
"jdbc_url" = "jdbc:mysql://example.net:3306",
"driver_url" = "mysql-connector-j-8.3.0.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver"
);Federated query example:
-- Join MySQL user table with Hive order table
SELECT
u.user_id,
u.user_name,
COUNT(o.order_id) AS order_count
FROM
mysql_catalog.user_db.t_user u
JOIN
hive_catalog.order_db.t_order o
ON
u.user_id = o.user_id
WHERE
o.order_date >= '2025-01-01'
GROUP BY
u.user_id, u.user_name;Below are screenshots showing Doris's role in lakehouse architectures.
Doris + Scheduled Jobs for Near‑Real‑Time Data Warehouse
The second major use case is building a near‑real‑time data warehouse with Doris via scheduled jobs.
When second‑level latency is not required, minute‑ or hour‑level scheduling with Doris can achieve near‑real‑time processing.
Companies like Meituan and Douyin have built near‑real‑time production warehouses using Doris combined with scheduled jobs.
These architectures leverage Doris's real‑time write capability (e.g., KafkaToDoris) with reliable 5‑30‑minute schedules for micro‑batch processing.
Real‑time and batch data are fused in Doris, using its efficient OLAP interaction to support flexible queries.
Business logic is encapsulated in views, reusing multidimensional models, improving development efficiency and reducing O&M cost.
Doris 2.1 introduced a Job Scheduler with second‑level precision, though integration with third‑party schedulers is recommended.
Replacing Elasticsearch for Multi‑Dimensional Analysis
Elasticsearch is widely used for real‑time analysis, log analysis, full‑text search, and monitoring, but its DSL and cost have led many to consider Doris as a replacement.
We previously published a detailed comparison of Doris vs Elasticsearch.
Doris Optimization
Optimization focus varies across companies but generally includes reads, writes, and compaction.
Refer to earlier posts for detailed performance tuning guidance.
Doris performance optimization (1)
Doris performance optimization (2)
Doris performance optimization (3)
Other
Some companies have custom tricks and optimizations based on their specific business and platform, which are not listed here.
This concludes our sharing; we will continue to update best practices for other components.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
