Efficient High‑Concurrency Data Retrieval Using Inverted Index and Bitmap Techniques
This article explores how to achieve fast, scalable data retrieval in million‑level high‑concurrency scenarios by replacing naïve full‑combination rule matching with column‑wise inverted indexes and bitmap operations, dramatically reducing time complexity and improving stability while leveraging RoaringBitmap compression for space efficiency.
The article investigates efficient data retrieval and processing in million‑level high‑concurrency environments, focusing on the implementation of inverted indexes and bitmap calculations to accelerate search and reduce storage overhead.
Background : The Promise order‑control system serves multiple nodes before and after order placement, handling the highest concurrency in the system (over one million TPS during peak periods). The core technical challenge is quickly and correctly matching user requests against a rule library containing hundreds of thousands of rules.
Naïve Solution : Cache the rule library line‑by‑line in Redis (key = rule condition, value = rule result). For each request, generate all possible parameter combinations and attempt to match them, leading to a time complexity of 2ⁿ (n≈12) and worst‑case 4096 calculations per request.
Figure 1.
New Solution : Adopt column‑wise inverted indexes combined with bitwise operations. Each column’s values are mapped to posting lists (row IDs). Bitmaps are generated for each posting list, and query parameters are used as keys to retrieve matching bitmaps. Column‑level OR operations handle empty values, while inter‑column AND operations yield the candidate rule set.
Figure 2.
Algorithm Details :
Pre‑compute column inverted indexes and corresponding bitmaps (Posting Lists).
During a request, use request parameters as keys to fetch matching bitmaps.
Perform column‑wise OR for each column (including empty values) and then inter‑column AND to obtain the candidate rule set.
Rank candidate rules by business priority and iteratively apply AND operations to find the optimal rule.
Figure 3.
Complexity Analysis : The new approach reduces time complexity from exponential to linear (≈3n ≈ n). Space complexity increases due to storing posting lists for each column, but this is mitigated by compressing bitmaps with RoaringBitmap, achieving up to 100× space savings.
Engineering Considerations – Bitmap Compression : Sparse bitmaps can waste memory; RoaringBitmap stores only populated blocks, drastically reducing memory usage (e.g., 1.2 MB vs. 144 bytes for a 10 million‑bit bitmap).
Figure 4.
Applicability : The method works well for equality queries with simple posting lists but is unsuitable for range or LIKE queries due to the need for exhaustive key enumeration.
Other Optimizations : Alternatives such as skip‑lists or hash tables (e.g., Elasticsearch’s skip‑list) can further accelerate intersect operations, achieving logarithmic time complexity.
Figure 5.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.