Artificial Intelligence 27 min read

Dual Vector Foil (DVF): Decoupled Index and Model Retrieval for Large-Scale Recall

The Dual Vector Foil (DVF) system decouples index construction from model training by building a post‑training HNSW graph, enabling any complex model to score candidates, which yields a 5.7 % recall boost, cuts latency from ~40 ms to 6.5 ms, and raises QPS over tenfold while simplifying maintenance.

Alimama Tech
Alimama Tech
Alimama Tech
Dual Vector Foil (DVF): Decoupled Index and Model Retrieval for Large-Scale Recall

With the rapid growth of internet services, companies have accumulated massive high‑quality content. In such a scenario, recall modules—positioned at the front of the recommendation pipeline—are critical because they determine the upper bound of overall service quality.

The core problem of recall is to select a high‑quality, limited‑size subset from an enormous candidate pool. Historically, recall has evolved from heuristic rules to collaborative‑filtering and finally to model‑based approaches. Two mainstream model‑based solutions exist: a two‑stage (two‑tower) architecture that relies on vector inner‑product search, and a one‑stage architecture that jointly learns the index and the model (e.g., the TDM series).

Two‑stage solutions suffer from a mismatch between training and retrieval objectives and are constrained by the inner‑product model structure, limiting their expressive power. One‑stage solutions alleviate these issues but introduce heavy coupling between index construction and model training, making maintenance and rapid iteration difficult.

To address these challenges, the Dual Vector Foil (DVF) algorithm system was proposed. DVF decouples index learning from model training while retaining the ability to use arbitrarily complex models. The name comes from the sci‑fi concept of compressing a three‑dimensional structure into two dimensions, reflecting the goal of keeping model flexibility while simplifying the index.

DVF builds the index post‑training using a Hierarchical Navigable Small World (HNSW) graph, which imposes no constraints on the model’s embedding space. Retrieval proceeds layer‑by‑layer: starting from a set of seed nodes, the HNSW graph is traversed, each visited node is scored by the model, and the top‑K candidates are passed to the next layer. The final layer’s top‑K items constitute the recall result.

The scoring model consists of four components: (1) user‑level aggregated features extracted by a Transformer, (2) user behavior sequence features combined with the target via target‑attention, (3) multi‑layer MLP for target feature extraction, and (4) a final MLP that merges the three feature streams to produce a relevance score.

From an engineering perspective, DVF integrates both the index and the model into a unified inference module, reducing request latency by ~12 ms. Online retrieval runs on CPU, while scoring runs on GPU; custom TensorFlow ops (set‑difference, set‑union, bitmap) and linear‑attention kernels further accelerate the pipeline. XLA auto‑padding is employed to handle dynamic batch sizes without triggering JIT recompilation.

Offline experiments show that removing the inner‑product restriction yields a 5.71 % absolute recall gain, and DVF achieves comparable recall with only 1.9 % of the items scored. Online benchmarks on a T4 GPU demonstrate latency reductions from 39.8 ms to 6.5 ms and QPS improvements from 68 to over 600 after successive optimizations.

In summary, DVF provides a lightweight, high‑performance, and model‑agnostic solution for large‑scale recall, with clear advantages in both accuracy and system efficiency. Future work includes further model upgrades, NPU acceleration, and exploration of alternative graph construction methods.

recommendationdeep learningIndexinglarge-scale retrievaldual vector foilOnline Inference
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.