Databases 17 min read

Performance Optimization of Apache Doris for A/B Experiment Queries at Xiaomi

This article analyzes the performance bottlenecks of A/B experiment report queries on Apache Doris at Xiaomi, presents data-driven insights on query latency, field usage, and experiment ID matching, and details a series of optimizations—including pre‑aggregation, materialized views, bitmap deduplication, and schema redesign—that reduced query times by up to 60× and lowered cluster load.

DataFunTalk
DataFunTalk
DataFunTalk
Performance Optimization of Apache Doris for A/B Experiment Queries at Xiaomi

Apache Doris is one of Xiaomi's most widely used OLAP engines. The article examines the current performance of A/B experiment report queries, highlighting issues such as hour‑level P95 latency, massive scan rows, and high CPU/IO utilization on Doris BE nodes.

The data pipeline feeds experiment logs into a Talos message queue, then writes detailed rows into Doris tables while also syncing to Hive for backup. Detailed tables store experiment configuration, metadata, and user logs, with the exp_id field stored as a comma‑separated string, leading to inefficient LIKE queries.

Analysis of query logs shows that most reports scan hundreds of billions of rows, with 69.1% of queries spanning more than one day. Field usage is highly skewed: only 15.2% of dimensions are hot, yet all are stored in the detailed table, causing unnecessary scan volume.

Key optimization steps include:

Pre‑aggregating high‑frequency dimensions into separate aggregate tables, limiting daily row count to ~80 million while achieving >80% query coverage.

Replacing the comma‑separated exp_id with a dedicated integer dimension, enabling exact match queries and leveraging Doris prefix indexes and Bloom filters.

Using materialized views to store pre‑aggregated results, allowing the query planner to route requests to the most appropriate layer (materialized view → aggregate table → detailed table).

Implementing bitmap deduplication (e.g., bit_or / bit_count ) for distinct user counts, dramatically reducing storage and computation cost compared to COUNT DISTINCT or APPROX_COUNT_DISTINCT .

Maintaining a global dictionary to map string user_id values to integers for bitmap operations.

Example DDL for a detailed table (using the DUPLICATE model) is shown below:

CREATE TABLE `dwd_xxxxxx` (
  `olap_date` int(11) NULL COMMENT "分区日期",
  `user_id` varchar(256) NULL COMMENT "用户id",
  `exp_id` varchar(512) NULL COMMENT "实验组ID",
  `dimension1` varchar(256) NULL COMMENT "",
  `dimension2` varchar(256) NULL COMMENT "",
  ...
  `dimensionN` bigint(20) NULL COMMENT "",
  `index1` decimal(20,3) NULL COMMENT "",
  ...
  `indexN` int(11) NULL COMMENT ""
) ENGINE=OLAP DUPLICATE KEY(`olap_date`,`user_id`)
COMMENT "OLAP"
PARTITION BY RANGE(`olap_date`)(
  PARTITION p20221101 VALUES [("20221101"),("20221102")],
  PARTITION p20221102 VALUES [("20221102"),("20221103")],
  PARTITION p20221103 VALUES [("20221103"),("20221104")]
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 300;

SQL examples illustrate the transition from LIKE to exact match and from detailed to aggregate queries:

-- Detailed table query (dimension)
SELECT targetConvNum, COUNT(DISTINCT user_id)
FROM analysis.doris_xxx_event
WHERE olap_date = 20221105
  AND event_name = 'CONVERSION'
  AND exp_id LIKE '%154556%'
GROUP BY targetConvNum;

-- Aggregate table query (same result)
SELECT SUM(targetConvNum * cnt)
FROM agg.doris_xxx_event_agg
WHERE olap_date = 20221105
  AND event_name = 'CONVERSION'
  AND exp_id = 154556;

After applying these optimizations, query latency improved by 50‑60× for regular aggregates and ~10× for distinct‑user counts, while BE CPU usage and disk I/O dropped significantly. The overall system stability benefited from a combination of data pre‑aggregation, query engine tuning, and Doris's built‑in monitoring tools.

The article concludes with practical tips, such as handling fields that serve both as dimensions and metrics, and encourages community participation in the Apache Doris ecosystem.

performance optimizationA/B testingBitMapmaterialized viewPre-aggregationApache Doris
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.