Artificial Intelligence 15 min read

E‑commerce Search Engine Recall and Vector Retrieval Techniques

This article explains how e‑commerce platforms use inverted indexes for fast word‑based recall, introduce vector‑based semantic retrieval, and combine deep‑learning models such as DSSM and DeepMatch with real‑time user behavior attention networks to generate efficient, personalized candidate sets for ranking.

DataFunSummit

Dec 6, 2020

E‑commerce Search Engine Recall and Vector Retrieval Techniques

Search engines are widely used in e‑commerce; to retrieve items efficiently among billions of products, an inverted index is built so that each searchable token maps to the products containing it.

The first stage, called recall, quickly narrows the candidate set using simple matching before more expensive ranking stages.

Word‑based recall relies on proper tokenization and query rewriting (e.g., “圆领” AND “T恤”) to intersect posting lists and obtain relevant items, while a recall score limits the number of candidates passed to later stages.

To address category bias and improve relevance, recall scores consider product quality, release time, and can be adjusted per‑category using predictions.

Vector‑based recall complements word recall by capturing semantic similarity; the problem is formulated as a classification task with cross‑entropy loss, and pairwise or triplet losses are also described.

Deep learning models such as Microsoft’s DSSM (Deep Structured Semantic Model) and Google’s DeepMatch embed queries and items into vectors and compute cosine similarity for semantic matching.

A personalized real‑time recall model combines offline‑precomputed user ID features with an online attention network that processes the user's recent behavior sequence, producing a real‑time user vector for fast Top‑N retrieval.

Additional recall paths include I2I expansion from items the user has interacted with, and text‑based recall using DSSM on product titles and copy, as well as multi‑path fusion to enrich candidate sets.

The system is deployed by splitting the model into offline and online components to accelerate inference, and the overall loss remains cross‑entropy with importance sampling of positive and negative samples.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recall Vector Retrieval

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.