Backend Development 13 min read

Optimization of Local Vector Retrieval: Filtering, Storage, and Sorting Strategies

This article presents a comprehensive study of local vector retrieval optimization, covering memory‑based filtering techniques, Redis‑backed vector storage designs, and various sorting algorithms—including radix and heap‑based approaches—to achieve lower latency and higher throughput for large‑scale ad recommendation systems.

IEG Growth Platform Technology Team
IEG Growth Platform Technology Team
IEG Growth Platform Technology Team
Optimization of Local Vector Retrieval: Filtering, Storage, and Sorting Strategies

1 Background

The previous article introduced accelerated vector similarity computation; a complete vector retrieval pipeline also requires result sorting and optional pre‑filtering of vectors in the database.

Local vector retrieval still has optimization potential in both filtering and sorting, which this article explores.

2 Filtering Optimization

Although vectors are loaded into memory and SIMD‑optimized for fast computation, further reduction of the candidate set is possible.

Example: a user vector originally compares against 1,000 vectors; attribute‑based filtering can halve this to 500, speeding up retrieval.

2.1 Vector Filtering

Ad vectors are associated with metadata (model version, creative type, platform, template, media). Filtering can be performed on these attributes before similarity calculation.

Memory Object: store ad info as objects with vector attribute; iterate and filter.

Memory Bitmap: generate a bitmap for each attribute value; combine bitmaps via set operations to obtain matching ads.

Memory Inverted Index: two‑level maps—first map indexes by combined attributes to ad ID lists; second map stores vectors by model version and ad ID.

Performance tests show QPS ranking: Inverted Index > Bitmap > Object; CPU usage: Bitmap > Object > Inverted Index; latency: Object > Bitmap > Inverted Index.

2.2 Vector Storage

Two‑stage process: offline loading into Redis and online loading from Redis to memory.

Offline stage offers two schemes:

Scheme 1: single Redis hash where each field stores a compressed vector list (zip + base64); leads to vector redundancy.

Scheme 2: one hash for attribute‑to‑ad‑ID mapping and separate keys for each ad’s vector; avoids large keys and balances load.

Benchmark with 100k ads using Scheme 2 consumes ~270 MiB for vectors and 3 MiB for the inverted index; with four online versions the total memory is about 1 GiB.

Overall comparison:

Scheme 1 – fewer Redis calls but complex parsing, higher memory, hard to incrementally update.

Scheme 2 – simpler parsing, lower memory, easy incremental updates, at the cost of more Redis calls.

3 Sorting Optimization

After filtering and similarity calculation, extracting the Top‑K results can be a bottleneck.

3.1 Full‑list Sorting

Golang’s default sort (quick‑sort + heap + insertion) is slower than SIMD similarity computation. Alternative algorithms evaluated:

Heap sort (common Top‑K method)

Floating‑point radix sort (non‑comparison)

Parallel floating‑point radix sort (divide‑and‑conquer)

Radix sort converts floats to binary, buckets by segments, and handles sign bits to place negatives correctly.

Theoretical complexity O(d·(n+2^(32/d))+n) guides segment count selection; experiments show fewer segments perform better on larger datasets, while 4‑segment radix sort excels below 50 k items.

Parallel radix sort outperforms the others, achieving up to 11× speed‑up over Golang’s default sort, though for very small data the overhead can make it slower.

3.2 Partial (Local) Sorting

When only the Top‑K is needed, three local‑sorting strategies were tested:

Bubble sort Top‑K – O(n·k), fast for very small n.

Quick sort‑based Top‑K – O(n·log n), but performed poorly in benchmarks.

Heap sort Top‑K – O(n·log k), stable and fastest for medium‑to‑large n.

Benchmarks (k = 1000) indicate bubble sort is best under 20 k items, while heap sort dominates for larger volumes, outperforming both radix‑based full sorts for up to 100 k items.

4 Methodology

The practical experience can be abstracted into reusable methods:

Local vector retrieval suitable for sub‑million‑scale datasets.

SIMD‑custom programming for other math‑heavy workloads.

Memory‑based inverted index and bitmap filtering applicable to generic data‑filtering scenarios.

Floating‑point radix sort and partial‑sort algorithms for accelerating diverse sorting tasks.

5 Conclusion

After applying the described filtering, storage, and sorting optimizations, both recall and coarse‑ranking service latencies drop significantly, allowing the system to handle higher QPS and larger ad inventories while maintaining stable performance.

golangRedisVector Searchbackend optimizationfilteringSortingheap sort
IEG Growth Platform Technology Team
Written by

IEG Growth Platform Technology Team

Official account of Tencent IEG Growth Platform Technology Team, showcasing cutting‑edge achievements across front‑end, back‑end, client, algorithm, testing and other domains.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.