Databases 20 min read

Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices

This article presents a comprehensive overview of precise deduplication in Meituan's Doris database, detailing the underlying bitmap data structures, aggregation bottlenecks, and a series of optimizations—including memory management, fast union, orthogonal encoding, and vectorized engine integration—that together achieve significant performance gains in high‑cardinality scenarios.

DataFunSummit

Dec 16, 2023

Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices

The presentation introduces the business need for precise deduplication in traffic‑heavy scenarios, where metrics such as PV, UV, and daily active users require exact distinct counts that are computationally intensive.

It outlines three traditional solutions—pre‑aggregation in the data warehouse, fuzzy deduplication, and pre‑aggregation processing—highlighting their limitations for real‑time, multi‑dimensional analysis.

The core of Doris's solution relies on a two‑stage aggregation model (Streaming Agg and Merge Agg) that leverages bitmap data structures based on Roaring Bitmap. Three container types (Array, Bitset, Run‑Length) are described, along with their size thresholds and conversion rules.

Key performance challenges are identified: non‑numeric type support, high‑cardinality throughput degradation, and costly memory copies. To address these, the authors propose dictionary encoding for non‑numeric fields, orthogonal encoding to reduce container count, and careful data layout to favor Bitset containers.

Bitmap aggregation optimizations focus on space reduction, off‑loading computation to vectorized bitwise operations, and ensuring dense data distributions. Memory copy overhead is mitigated by switching from TCMalloc to Jemalloc and enabling thread‑safe copy‑on‑write for bitmaps, reducing expression evaluation time from 56% to 14% of total aggregation.

The integration of a Fast Union interface from Roaring Bitmap allows batch updates, delayed cardinality computation, and fewer data movements, further accelerating deduplication.

Additional engine‑level tweaks include lightweight aggregation during the Scan phase for ordered data, and parallel scan threads to improve throughput for long‑range queries.

Experimental results on a cluster with 3 FE and 100 BE nodes demonstrate up to 10× query speedup after applying independent and orthogonal encoding, as well as 20‑30% latency reductions from container‑level optimizations.

Finally, the article summarizes optimization principles: use efficient bitwise operations, apply memory‑efficient allocation and copy‑on‑write, support fast union, and push aggregation down when possible.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database Bitmap deduplication OLAP vectorization doris

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.