Databases 20 min read

Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices

This article presents a comprehensive overview of precise deduplication in Meituan's Doris database, detailing the underlying bitmap data structures, aggregation bottlenecks, and a series of optimizations—including memory management, fast union, orthogonal encoding, and vectorized engine integration—that together achieve significant performance gains in high‑cardinality scenarios.

DataFunSummit
DataFunSummit
DataFunSummit
Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices

The presentation introduces the business need for precise deduplication in traffic‑heavy scenarios, where metrics such as PV, UV, and daily active users require exact distinct counts that are computationally intensive.

It outlines three traditional solutions—pre‑aggregation in the data warehouse, fuzzy deduplication, and pre‑aggregation processing—highlighting their limitations for real‑time, multi‑dimensional analysis.

The core of Doris's solution relies on a two‑stage aggregation model (Streaming Agg and Merge Agg) that leverages bitmap data structures based on Roaring Bitmap. Three container types (Array, Bitset, Run‑Length) are described, along with their size thresholds and conversion rules.

Key performance challenges are identified: non‑numeric type support, high‑cardinality throughput degradation, and costly memory copies. To address these, the authors propose dictionary encoding for non‑numeric fields, orthogonal encoding to reduce container count, and careful data layout to favor Bitset containers.

Bitmap aggregation optimizations focus on space reduction, off‑loading computation to vectorized bitwise operations, and ensuring dense data distributions. Memory copy overhead is mitigated by switching from TCMalloc to Jemalloc and enabling thread‑safe copy‑on‑write for bitmaps, reducing expression evaluation time from 56% to 14% of total aggregation.

The integration of a Fast Union interface from Roaring Bitmap allows batch updates, delayed cardinality computation, and fewer data movements, further accelerating deduplication.

Additional engine‑level tweaks include lightweight aggregation during the Scan phase for ordered data, and parallel scan threads to improve throughput for long‑range queries.

Experimental results on a cluster with 3 FE and 100 BE nodes demonstrate up to 10× query speedup after applying independent and orthogonal encoding, as well as 20‑30% latency reductions from container‑level optimizations.

Finally, the article summarizes optimization principles: use efficient bitwise operations, apply memory‑efficient allocation and copy‑on‑write, support fast union, and push aggregation down when possible.

Performance OptimizationDatabaseBitMapDeduplicationOLAPVectorizationDoris
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.