Artificial Intelligence 28 min read

Dual Vector Foil (DVF): Decoupled Index and Model for Large‑Scale Retrieval

The article introduces the Dual Vector Foil (DVF) algorithm system, which decouples index construction from model training to enable lightweight, high‑precision large‑scale recall using arbitrary complex models, and details its two‑stage and one‑stage solutions, graph‑based retrieval implementation, performance optimizations, and experimental results.

DataFunTalk
DataFunTalk
DataFunTalk
Dual Vector Foil (DVF): Decoupled Index and Model for Large‑Scale Retrieval

The rapid growth of internet services has led to massive candidate pools for recommendation, search and advertising, making the recall stage—selecting a high‑quality, limited subset from billions of items—critical for overall system performance.

Recall can be formalized as finding a subset that maximizes a value‑measurement function f(user, item) over the full candidate set. Historically, recall methods evolved from heuristic rules to collaborative filtering and finally to model‑based approaches.

Two dominant paradigms exist:

Two‑stage solutions

These separate the value function into a representation stage (user and item vectors) and a retrieval stage (large‑scale inner‑product nearest‑neighbor search). They rely on vector‑based indexes such as Faiss or Alibaba’s Proxima, and research works like MIND, ComiRec, and CurvLearn. Limitations include a mismatch between training and retrieval objectives and a restrictive inner‑product model structure.

One‑stage solutions

Methods like TDM and Deep Retrieval jointly learn the index structure and the scoring model, removing the two‑stage mismatch and allowing more expressive models. However, joint training incurs higher system and time costs and makes it difficult to incorporate rich item side‑information.

Dual Vector Foil (DVF) system

DVF aims to keep the high recall quality of one‑stage methods while decoupling index learning from model training. After model training, a post‑training index is built using a hierarchical HNSW graph on item embeddings, without any virtual nodes. The retrieval proceeds layer‑by‑layer, expanding candidate neighborhoods, gathering embeddings on the CPU, scoring them on the GPU, and iterating until the final top‑K set is obtained.

Key components of the scoring model include:

User‑level aggregated features extracted via a Transformer.

User behavior sequence features combined with the target via target‑attention.

Target features processed by multi‑layer MLPs.

Final scoring by merging the three feature streams through additional MLP layers.

Implementation details:

Graph representation uses TensorFlow ragged tensors: values = [7, 8, 10, 12, 15, 5, 6, 3] and row_splits = [0, 3, 6, 8] .

Custom TensorFlow ops replace slow set/where ops, and a bitmap‑based set implementation accelerates set‑difference/union operations.

Linear attention reduces GPU memory pressure for large batch sizes.

XLA auto‑padding handles dynamic shapes during retrieval, avoiding JIT stalls.

Experimental results

Offline experiments on top‑1500 recall show that removing the inner‑product restriction improves recall by 5.71 % (full‑score) and 3.03 % (DVF). DVF achieves 95 % of the full‑score recall with only 1.9 % of items scored, demonstrating efficient pruning.

Online benchmarks on a T4 GPU (half‑precision, XLA enabled) show latency reductions from 39.8 ms to 6.5 ms and QPS improvements from 68 to over 600 after applying TF Raw Op, custom Set Op, bitmap Op, linear attention, and XLA auto‑padding.

Overall, DVF provides a lightweight, decoupled solution that supports arbitrary complex models, simplifies the online pipeline, and delivers high recall and throughput, with strong potential for broader adoption and open‑source release.

algorithmdeep learningRecommendation systemsLarge Scaleretrievalgraph indexing
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.