Databases 18 min read

Design and Implementation of Real‑Time OLAP with Apache Doris at Dingdong Maicai

This article details Dingdong Maicai’s business‑driven requirements for a real‑time OLAP platform, evaluates Apache Doris versus ClickHouse, describes the end‑to‑end architecture—including data ingestion, modeling, and query optimization techniques such as colocate joins, array_contains, broker load, bitmap, prefix and bloom‑filter indexes, and materialized views—and shares practical performance experiences and best‑practice recommendations.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Design and Implementation of Real‑Time OLAP with Apache Doris at Dingdong Maicai

Dingdong Maicai, founded in 2017, needed a flexible, high‑performance real‑time OLAP database to support diverse business scenarios such as dashboards, ad‑hoc queries, behavior analysis, B/C‑end platforms, and tagging systems.

Business Requirements included real‑time efficient analysis, support for both detail and aggregated queries, and instant dimension‑condition queries with sub‑second latency.

Selection and Comparison focused on Apache Doris and ClickHouse, emphasizing standard SQL support, join capabilities, high concurrency, real‑time and offline ingestion, and large‑scale detail data queries. Doris was chosen for its superior performance in these areas.

Architecture Overview shows Elasticsearch for tag data, HBase for dimension tables, MySQL for business data, Kafka for streaming, Spark for ad‑hoc, and Doris as the core OLAP engine. Offline data flows from business DBs via DataX to ODS, then to Doris; real‑time data is processed by Flink from Kafka into Doris or HDFS.

Real‑Time Data Analysis uses Flink to process streams, with Doris providing low‑latency joins and sub‑second query results, enabling scenarios like sales‑plan forecasting with response times under two seconds.

B‑End Query Use‑Case replaced Spark‑JDBC over Hive (minutes‑long queries) with Doris, achieving second‑level response for user‑profile lookups.

Tag System Migration moved from Elasticsearch to Doris, leveraging distributed joins (shuffle, broadcast, colocate) and choosing colocate join for optimal performance.

Optimization Techniques :

Colocate Join – ensures data locality during import, query scheduling, and balance, eliminating unnecessary network exchange.

Array_contains – processes JSON arrays efficiently for user segmentation.

Broker Load – accelerates bulk data import from Hive to Doris.

Bitmap Functions – replace count(distinct) for massive deduplication, achieving orders‑of‑magnitude speedup.

Prefix Index & Bloom Filter – improve query filtering on high‑cardinality columns.

Materialized Views (Rollup) – create aggregated tables to reduce scan volume for common queries.

CREATE TABLE `profile_table` ( `pdate` date NULL COMMENT "null", `product_mongo_id` varchar(4000) NULL COMMENT "商品ID", `station_id` varchar(4000) NULL COMMENT "仓id", ... ) ENGINE=OLAP UNIQUE KEY(`pdate`,`product_mongo_id`,`station_id`) COMMENT "OLAP" PARTITION BY RANGE(`pdate`)() DISTRIBUTED BY HASH(`product_mongo_id`) BUCKETS 7 PROPERTIES ("colocate_with" = "profile_table", "in_memory" = "false", "storage_format" = "V2");

Another table example:

CREATE TABLE `station_info_table` ( `product_mongo_id` varchar(4000) NULL COMMENT "商品id", `station_id` varchar(4000) NULL COMMENT "站点id", `snapshot` date NULL COMMENT "日期", `product_id` bigint(20) NULL COMMENT "商品id", ... ) ENGINE=OLAPUNIQUE KEY(`product_mongo_id`,`station_id`,`snapshot`) COMMENT "OLAP" PARTITION BY RANGE(`snapshot`)() DISTRIBUTED BY HASH(`product_mongo_id`) BUCKETS 7 PROPERTIES ("colocate_with" = "profile_table", "in_memory" = "false", "storage_format" = "V2");

Sample query using colocate join:

select count(psp.product_mongo_id) from profile_table psp left join station_info_table psi on psp.product_mongo_id=psi.product_mongo_id and psp.station_id=psi.station_id where psp.pdate='2023-03-16' and psp.four_category='特色醋' and psp.brand_name='宝鼎天鱼' and psp.weight_unit='ml' and psp.pmg_name='粮油调味组';

After applying colocate join, query latency drops to milliseconds.

Optimization Experience includes workflow‑driven modeling, automated broker‑load task generation, and scheduled data ingestion, reducing manual effort and improving reliability.

Overall, Dingdong Maicai’s adoption of Apache Doris delivers sub‑second query performance, high concurrency (120 QPS per cluster), and scalable storage (hundreds of TB, billions of rows daily), supporting both real‑time and offline analytics across the organization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OLAPApache Dorisdistributed joinbroker load
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.