Big Data 9 min read

How Companies Deploy Apache Doris for Real‑Time Data Warehousing

This article summarizes how enterprises adopt Apache Doris for lakehouse integration, near‑real‑time data warehouses, federated queries, and performance optimization, highlighting practical experiences, connector support, scheduling capabilities, and comparisons with Elasticsearch.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
How Companies Deploy Apache Doris for Real‑Time Data Warehousing

Hello everyone, today we share a summary of Doris production practice in various companies.

Doris has evolved from pure OLAP to replace previous query engines and competes with ES, ClickHouse.

We summarize useful experiences from community shares in the first half of 2025.

Landing Scenarios Summary

From dozens of company shares, Doris usage has expanded from pure OLAP to near‑real‑time scheduling and lakehouse integration.

Lakehouse Integration

Since version 2.1, Apache Doris has significantly improved lakehouse capabilities, with many companies trying it.

Data is collected via Kafka, processed with Flink or Spark, and ingested into lake tables or Doris tables based on timeliness.

Extensive data connectors support

Doris supports many connectors such as Hive, Iceberg, Hudi, Paimon, and JDBC databases.

It also provides an extensible connector framework with standard Catalog, Database, Table hierarchy, allowing developers to map to data sources and implement access logic.

Data lake analysis engine

Doris integrates deeply with lake frameworks like Paimon and Iceberg, enabling direct access and accelerated queries on lake tables, and supports writing results back to Iceberg.

Analysis engine

Doris can act as a unified SQL query engine for federated analysis across different data sources.

By creating multiple catalogs, SQL can reference catalog.db.table to join data across sources.

Example of creating catalogs:

-- Create Hive Catalog
CREATE CATALOG hive PROPERTIES (
  'type'='hms',
  'hive.metastore.uris' = 'thrift://172.0.0.1:9083'
);

-- Create MySQL Catalog
CREATE CATALOG mysql PROPERTIES (
  "type"="jdbc",
  "user"="root",
  "password"="pwd",
  "jdbc_url" = "jdbc:mysql://example.net:3306",
  "driver_url" = "mysql-connector-j-8.3.0.jar",
  "driver_class" = "com.mysql.cj.jdbc.Driver"
);

Federated query example:

-- Join MySQL user table with Hive order table
SELECT
  u.user_id,
  u.user_name,
  COUNT(o.order_id) AS order_count
FROM
  mysql_catalog.user_db.t_user u
JOIN
  hive_catalog.order_db.t_order o
ON
  u.user_id = o.user_id
WHERE
  o.order_date >= '2025-01-01'
GROUP BY
  u.user_id, u.user_name;

Below are screenshots showing Doris's role in lakehouse architectures.

Doris + Scheduled Jobs for Near‑Real‑Time Data Warehouse

The second major use case is building a near‑real‑time data warehouse with Doris via scheduled jobs.

When second‑level latency is not required, minute‑ or hour‑level scheduling with Doris can achieve near‑real‑time processing.

Companies like Meituan and Douyin have built near‑real‑time production warehouses using Doris combined with scheduled jobs.

These architectures leverage Doris's real‑time write capability (e.g., KafkaToDoris) with reliable 5‑30‑minute schedules for micro‑batch processing.

Real‑time and batch data are fused in Doris, using its efficient OLAP interaction to support flexible queries.

Business logic is encapsulated in views, reusing multidimensional models, improving development efficiency and reducing O&M cost.

Doris 2.1 introduced a Job Scheduler with second‑level precision, though integration with third‑party schedulers is recommended.

Replacing Elasticsearch for Multi‑Dimensional Analysis

Elasticsearch is widely used for real‑time analysis, log analysis, full‑text search, and monitoring, but its DSL and cost have led many to consider Doris as a replacement.

We previously published a detailed comparison of Doris vs Elasticsearch.

Doris Optimization

Optimization focus varies across companies but generally includes reads, writes, and compaction.

Refer to earlier posts for detailed performance tuning guidance.

Doris performance optimization (1)

Doris performance optimization (2)

Doris performance optimization (3)

Other

Some companies have custom tricks and optimizations based on their specific business and platform, which are not listed here.

This concludes our sharing; we will continue to update best practices for other components.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance optimizationReal-time analyticsData WarehouseLakehouseApache DorisSQL Federation
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.