How DSearch Evolved: From RCU‑Based 1.0 Indexes to Async Graph‑Powered 3.0

This article provides a detailed technical walkthrough of the DSearch search engine evolution, covering the RCU‑based 1.0 index architecture, the segment‑merge enhancements in 2.0, and the async non‑blocking graph framework introduced in 3.0, together with performance benchmarks and implementation details.

DeWu Technology
DeWu Technology
DeWu Technology
How DSearch Evolved: From RCU‑Based 1.0 Indexes to Async Graph‑Powered 3.0

Background

Rapid growth of transaction and community search workloads exposed severe performance bottlenecks in the original DSearch indexing layer, increased operational costs, and low development efficiency due to tightly coupled modules and complex maintenance.

Engine Development Technical Solution

DSearch 1.0 Index Layer Overall Structure

DSearch 1.0 uses a global RCU (Read‑Copy‑Update) design with a single‑writer‑multiple‑reader model. All internal data structures are lock‑free, providing high‑throughput, low‑latency reads. Writes are also lock‑free thanks to RCU. Advantages: excellent read performance and no disk‑level segment merges. Drawbacks: strong OS address dependency, complex hot‑update handling, high memory consumption, and limited parallel write scalability.

DSearch 2.0 Index Upgrade

The 2.0 version adopts a classic segment‑merge architecture. It inherits high‑performance writes and queries from segment merging and adds an in‑place update mechanism for forward (positive) fields, achieving Redis‑like write speed, Lucene‑like compact indexes, and a reduced memory footprint.

Index Segment Structure

Document files store detailed field information for each document.

String‑pool file stores all strings in a contiguous region; each string is referenced by an ID‑based offset.

Variable‑array files hold array‑type data in an append‑only fashion; inter‑segment compaction reclaims memory.

Inverted files store doc‑id, term‑frequency (TF), and position data. The container type is chosen per segment: bitmap for high‑density doc‑ids, array otherwise.

Delete chains record soft‑deleted documents for final filtering.

Real‑time incremental trie trees provide prefix indexes for mutable segments.

Metadata files record per‑segment statistics such as document count, delete ratio, and Kafka offsets.

Document and Index Structures

Documents use compact storage: strings are pooled, arrays are stored in variable‑array files, and other fields are stored contiguously.

The string pool deduplicates identical strings across documents, saving space.

Inverted Index File Structure

Each segment stores doc‑id, TF, and position arrays. Real‑time segments use a fixed array type; static segments switch between array and bitmap based on doc‑id density.

TF is a uint16 array whose length matches the doc‑id array.

Positions are stored as a two‑dimensional array: the first dimension maps term offsets in the string pool, the second dimension holds occurrence positions as uint16.

Prefix Retrieval Files

Static segments contain an FST (finite‑state transducer) that maps terms to string‑pool offsets; queries perform an FST lookup followed by a binary search in the term‑offset file.

An optional HashMap index can map terms directly to offsets, reducing lookup cost.

Real‑time segments use an RCU‑based trie for concurrent read‑write prefix searches.

Inverted Index and Query Tree Optimizations

Inverted Chain Optimization

In DSearch 1.0, roaring‑bitmap containers caused performance degradation for low‑density data because short inverted chains still triggered binary searches.

The implementation was simplified to store only array or bitmap without extra container lookups, enabling direct chain merges.

Logical Tree Merge Optimization

DSearch 2.0 flattens the logical syntax tree, reducing merge depth from a binary tree to a single layer.

In‑place inverted‑chain merging eliminates temporary objects and introduces multi‑threaded parallel merges, cutting tail latency.

Incremental Real‑Time Write Logic

Multiple concurrent real‑time segments are configurable via a YAML/JSON file, improving write throughput.

Each real‑time segment has its own write queue.

Writes atomically update the consumed Kafka offset for reliable recovery.

On process restart, the latest Kafka offset from the metadata file is read to rebuild forward, document, and inverted structures.

Real‑Time Segment Solidification & Segment Merge Strategy

Solidification Logic

When a real‑time segment exceeds 128 MiB, it is frozen into a read‑only segment.

The old segment becomes immutable while a new mutable segment is created.

All read‑only segments are traversed to build a new static segment.

Merge Strategy

Small frozen segments are merged in size‑doubling batches (1 MiB → 2 MiB → 4 MiB → 8 MiB) to form larger static segments.

Static segment merges are triggered when the delete‑document ratio exceeds 30 %; the smallest neighboring segment is merged first.

Concurrency Control in Query and Update

Query Flow

Queries first scan all real‑time segments up to a configured incremental cutoff, then proceed to static segments ordered by quality score, merging results from both sources.

Update Concurrency

DSearch 2.0 updates forward and inverted fields directly in real‑time or static segments, creating multi‑threaded read‑write conflicts, especially for high‑frequency forward‑field updates. A document‑level lock pool hashes each doc ID to a specific lock, reducing contention.

DSearch 3.0 Search Core Upgrade

Asynchronous Non‑Blocking Graph Scheduling Framework

RPC asynchronous non‑blocking requests : The graph framework uses brpc’s async handling, eliminating thread blockage during remote service calls.

Reduced thread switching : An M:N bthread pool keeps most work on the current thread, improving efficiency.

Decoupled RPC service and operators : Operators run independently of the RPC server, enabling cross‑cluster deployment.

Fine‑grained operator timeout handling : Each operator defines its own execution and request timeout.

Dynamic graph support : Both static and dynamic sub‑graphs can be composed at runtime.

Complex sub‑graph nesting : Supports recursive sub‑graph calls for sophisticated workflows.

Operator Data Exchange Table Design

Columnar data sharing : All operator outputs are stored in columnar tables, avoiding massive data copies.

Compatibility with doc‑row storage : The table also stores row‑oriented doc fields for fast random access.

FlatBuffer serialization bridge : Direct conversion between table columns and FlatBuffer messages reduces copy overhead.

In‑place sort and delete marking : Table supports on‑the‑fly sorting and logical deletions without full rewrites.

Additional 3.0 Enhancements

Dynamic graph orchestration allows business teams to define custom operator pipelines.

Remote asynchronous operator calls enable cross‑cluster execution, improving scalability.

Unified operator interfaces promote reusable component libraries.

Performance and Effectiveness Gains

Index memory usage reduced by 60 % per shard.

Write throughput increased tenfold.

Index update latency improved by nearly tenfold.

Average query latency halved; P99 latency during peak hours dropped by 2×.

Conclusion

DSearch progressed from a single‑writer RCU‑based engine (1.0) to a segment‑merge architecture (2.0) and finally to an asynchronous graph‑driven, columnar‑operator engine (3.0). The continuous upgrades delivered industry‑leading performance for the DeWu platform. Future work will focus on generalization, self‑iteration, and further high‑performance enhancements.

backendPerformance optimizationindexingsearch engineasync graphsegment merge
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.